网站首页 > 技术文章正文

MySQL索引（java下一页）

nanyue 2024-10-02 17:34:14 技术文章 11 ℃

索引

在数据库中索引最核心的作用是：加速查找

例如在含有300万条数据的表中查询，无索引需要700秒，而利用索引可能仅需1秒。

在开发过程中会为哪些经常会被搜索的列创建索引，以提高程序的响应速度。例如：查询手机号、邮箱、用户名等。

索引原理

为什么加上索引之后速度能有这么大的提升呢？因为索引的底层是基于B+Tree的数据结构存储的。

很明显，如果有了索引结构的查询效率比表中逐行查询的速度要快很多且数据量越大越明显。

B+Tree结构连接：https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

数据库的索引是基于上述B+Tree的数据结构实现，但在创建数据库表时，如果指定不同的引擎，底层使用的B+Tree结构的原理有些不同。

myisam引擎，非聚簇索引（数据和索引结构分开储存）
innodb引擎，聚簇索引（数据和主键索引结构存储在一起）

非聚簇索引

create table 表名(
    id int not null auto_increment primary key, 
    name varchar(32) not null,
    age int
)engine=myisam default charset=utf8;

假设要找一个表id为6的信息，基于id的索引6找到对应的存储id=6内存地址的节点，拿到真正存储数据的内存地址，基于内存地址直接定位取到数据。

普通索引和非聚集索引没什么区别。
叶子节点存放的是：磁盘地址。

叶节点的data域存放的是数据记录的地址。MyISAM的索引与行记录是分开存储的，叫做非聚集索引（UnClustered Index）。可以想象成一本书的目录和内容是分开的。

myisam引擎文件存储会创建三个文件：

表结构，数据，索引结构

聚簇索引

基于innodb引擎创建表的时候，实际上是没有表的，而是将主键通过树形结构存起来，而主键存的树形结构的最下面的叶子节点不仅存储了主键，还把这一行的数据信息也存储在了这里。

下图叶节点包含了完整的数据记录InnoDB的数据文件本身就是索引文件，而MyISAM索引文件和数据文件是分离的，索引文件仅保存数据记录的地址。而在InnoDB中，表数据文件本身就是按B+Tree组织的一个索引结构，这棵树的叶节点data域保存了完整的数据记录。这个索引的key是表的主键，因此InnoDB表数据文件本身就是主键索引。

这里InnoDB必须要有主键，因为数据文件要根据主键才能组合成一个索引结构，所以InnoDB要求表必须有主键（MyISAM可以没有），如果没有显式指定，则MySQL系统会自动选择一个可以唯一标识数据记录的列作为主键，如果不存在这种列，则MySQL自动为InnoDB表生成一个隐含字段作为主键，这个字段长度为6个字节，类型为长整形

假设想要对一个name列也创建个索引，那么在innofb引擎下会创建一个辅助索引，它会生成另外一个B+Tree的结构并按照name列进行存放，只是在树形结构存放完，在结构最下面的节点不会把行数据再存放一次，而是存放的主键id。

它的查找则是，根据name找到对应的主键id，再根据主键id去数据索引文件里把数据找到。

例如当普通索引通过xiaohuang查到id为1后，再到聚集索引中继续查询到子节点，也就是说通过普通索引查询会调用两次索引。

innodb引擎文件存储会创建两个文件：

表结构，数据和索引结构

常见索引

在开发过程中常见的索引类型有：

主键索引：加速查找、不能为空、不能重复。 + 联合主键索引
唯一索引：加速查找、不能重复。 + 联合唯一索引
普通索引：加速查找。 + 联合索引

主键和联合主键索引

create table 表名(
    id int not null auto_increment primary key,   -- 主键
    name varchar(32) not null
);

create table 表名(
    id int not null auto_increment,
    name varchar(32) not null,
    primary key(id)
);

create table 表名(
    id int not null auto_increment,
    name varchar(32) not null,
    primary key(列1,列2)          -- 如果有多列，称为联合主键（不常用且myisam引擎支持）
);

alter table 表名 add primary key(列名);

alter table 表名 drop primary key;

注意：删除索引时可能会报错，自增列必须定义为键。

#报错
ERROR 1075 (42000): Incorrect table definition; there can be only one auto column and it must be defined as a key

# 如果带有自增，修改表
alter table 表 change id id int not null;

#没有自增的话，直接执行删除
create table t7(
    id int not null,
    name varchar(32) not null,
    primary key(id)
);

alter table t6 drop primary key;

唯一和联合唯一索引

create table 表名(
    id int not null auto_increment primary key,
    name varchar(32) not null,
    email varchar(64) not null,
    unique ix_name (name),
    unique ix_email (email),
);

create table 表名(
    id int not null auto_increment,
    name varchar(32) not null,
    unique (列1,列2)               -- 如果有多列，称为联合唯一索引。
);

create unique index 索引名 on 表名(列名);

drop unique index 索引名 on 表名;

索引和联合索引

create table 表名(
    id int not null auto_increment primary key,
    name varchar(32) not null,
    email varchar(64) not null,
    index ix_email (email),
    index ix_name (name),
);

create table 表名(
    id int not null auto_increment primary key,
    name varchar(32) not null,
    email varchar(64) not null,
    index ix_email (name,email)     -- 如果有多列，称为联合索引。
);

create index 索引名 on 表名(列名);

drop index 索引名 on 表名;

案例：博客系统

每张表id列都创建自增 + 主键。
用户表
用户名 + 密码创建联合索引。
手机号，创建唯一索引。
邮箱，创建唯一索引。
推荐表
user_id和article_id创建联合唯一索引

操作表

在表中创建索引后，查询时一定要命中索引。

在数据库的表中创建索引之后优缺点如下：

优点：查找速度快、约束（唯一、主键、联合唯一）
缺点：插入、删除、更新速度比较慢，因为每次操作都需要调整整个B+Tree的数据结构关系。

所以，在表中不要无节制的去创建索引！！！

在开发中，我们会对表中经常被搜索的列创建索引，从而提高程序的响应速度。

CREATE TABLE `big` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `name` varchar(32) DEFAULT NULL,
    `email` varchar(64) DEFAULT NULL,
    `password` varchar(64) DEFAULT NULL,
    `age` int(11) DEFAULT NULL,
    PRIMARY KEY (`id`),                       -- 主键索引
    UNIQUE KEY `big_unique_email` (`email`),  -- 唯一索引
    index `ix_name_pwd` (`name`,`password`)     -- 联合索引
) ENGINE=InnoDB DEFAULT CHARSET=utf8

一般情况下，我们针对只要通过索引列去搜搜都可以命中索引（通过索引结构加速查找）。

select * from big where id = 5;
select * from big where id > 5;
select * from big where email = "wupeiqi@live.com";
select * from big where name = "武沛齐";
select * from big where name = "kelly" and password="ffsijfs";

但是，还是会有一些特殊的情况，让我们无法命中索引（即使创建了索引），这也是需要大家在开发中要注意的。

类型不一致

select * from big where name = 123;		-- 未命中
select * from big where email = 123;	-- 未命中

特殊的主键：
	select * from big where id = "123";	-- 命中

使用不等于

select * from big where name != "武沛齐";				-- 未命中
select * from big where email != "wupeiqi@live.com";  -- 未命中

特殊的主键：
	select * from big where id != 123;	-- 命中

or，当or条件中有未建立索引的列才失效。

select * from big where id = 123 or password="xx";			-- 未命中
select * from big where name = "wupeiqi" or password="xx";	-- 未命中
特别的：
	select * from big where id = 10 or password="xx" and name="xx"; -- 命中

排序，当根据索引排序时候，选择的映射如果不是索引，则不走索引。

select * from big order by name asc;     -- 未命中
select * from big order by name desc;    -- 未命中

特别的主键：
	select * from big order by id desc;  -- 命中

like，模糊匹配时。

select * from big where name like "%u-12-19999";	-- 未命中
select * from big where name like "_u-12-19999";	-- 未命中
select * from big where name like "wu-%-10";		-- 未命中

特别的：
	select * from big where name like "wu-1111-%";	-- 命中
	select * from big where name like "wuw-%";		-- 命中

使用函数

select * from big where reverse(name) = "wupeiqi";  -- 未命中

特别的：
	select * from big where name = reverse("wupeiqi");  -- 命中

最左前缀，如果是联合索引，要遵循最左前缀原则。

如果联合索引为：(name,password)
    name and password       -- 命中
    name                 	-- 命中
    password                -- 未命中
    name or password       	-- 未命中

常见的无法命中索引的情况就是上述的示例。

对于大家来说会现在的最大的问题是，记不住，哪怎么办呢？接下来看执行计划。

执行计划

MySQL中提供了执行计划，让你能够预判SQL的执行（只能给到一定的参考，不一定完全能预判准确）。

explain + SQL语句;

其中比较重要的是 type，他SQL性能比较重要的标志，性能从低到高依次：all < index < range < index_merge < ref_or_null < ref < eq_ref < system/const

ALL，全表扫描，数据表从头到尾找一遍。(一般未命中索引，都是会执行全部扫描)

select * from big;

特别的：如果有limit，则找到之后就不在继续向下扫描.
	select * from big limit 1;

INDEX，全索引扫描，对索引从头到尾找一遍

explain select id from big;
explain select name from big;

RANGE，对索引列进行范围查找

explain select * from big where id > 10;
explain select * from big where id in (11,22,33);
explain select * from big where id between 10 and 20;
explain select * from big where name > "wupeiqi" ;

INDEX_MERGE，合并索引，使用多个单列索引搜索

explain select * from big where id = 10 or name="武沛齐";

REF，根据索引直接去查找（非键）

select *  from big where name = '武沛齐';

EQ_REF，连表操作时常见。

explain select big.name,users.id from big left join users on big.age = users.id;

CONST，常量，表最多有一个匹配行,因为仅有一行,在这行的列值可被优化器剩余部分认为是常数,const表很快。

explain select * from big where id=11;					-- 主键
explain select * from big where email="w-11-0@qq.com";	-- 唯一索引

SYSTEM，系统，表仅有一行(=系统表)。这是const联接类型的一个特例

 explain select * from (select * from big where id=1 limit 1) as A;

其他列：

id，查询顺序标识

z，查询类型
    SIMPLE          简单查询
    PRIMARY         最外层查询
    SUBQUERY        映射为子查询
    DERIVED         子查询
    UNION           联合
    UNION RESULT    使用联合的结果
    ...
    
table，正在访问的表名

partitions，涉及的分区（MySQL支持将数据划分到不同的idb文件中，详单与数据的拆分）。 一个特别大的文件拆分成多个小文件（分区）。

possible_keys，查询涉及到的字段上若存在索引，则该索引将被列出，即：可能使用的索引。
key，显示MySQL在查询中实际使用的索引，若没有使用索引，显示为NULL。例如：有索引但未命中，则possible_keys显示、key则显示NULL。

key_len，表示索引字段的最大可能长度。(类型字节长度 + 变长2 + 可空1)，例如：key_len=195，类型varchar(64)，195=64*3+2+1

ref，连表时显示的关联信息。例如：A和B连表，显示连表的字段信息。

rows，估计读取的数据行数（只是预估值）
	explain select * from big where password ="025dfdeb-d803-425d-9834-445758885d1c";
	explain select * from big where password ="025dfdeb-d803-425d-9834-445758885d1c" limit 1;
filtered，返回结果的行占需要读到的行的百分比。
	explain select * from big where id=1;  -- 100，只读了一个1行，返回结果也是1行。
	explain select * from big where password="27d8ba90-edd0-4a2f-9aaf-99c9d607c3b3";  -- 10，读取了10行，返回了1行。
	注意：密码27d8ba90-edd0-4a2f-9aaf-99c9d607c3b3在第10行
	
extra，该列包含MySQL解决查询的详细信息。
    “Using index”
    此值表示mysql将使用覆盖索引，以避免访问表。不要把覆盖索引和index访问类型弄混了。
    “Using where”
    这意味着mysql服务器将在存储引擎检索行后再进行过滤，许多where条件里涉及索引中的列，当（并且如果）它读取索引时，就能被存储引擎检验，因此不是所有带where子句的查询都会显示“Using where”。有时“Using where”的出现就是一个暗示：查询可受益于不同的索引。
    “Using temporary”
    这意味着mysql在对查询结果排序时会使用一个临时表。
    “Using filesort”
    这意味着mysql会对结果使用一个外部索引排序，而不是按索引次序从表里读取行。mysql有两种文件排序算法，这两种排序方式都可以在内存或者磁盘上完成，explain不会告诉你mysql将使用哪一种文件排序，也不会告诉你排序会在内存里还是磁盘上完成。
    “Range checked for each record(index map: N)”
    这个意味着没有好用的索引，新的索引将在联接的每一行上重新估算，N是显示在possible_keys列中索引的位图，并且是冗余的。

上一篇：探索解密什么是MySQL索引的底层原理
下一篇： mybatis foreach 属性及其三种使用情况

网站首页 > 技术文章 正文