数据库
首页 > 数据库> > mysql – 为什么在SELECT COUNT(*)…的聚簇索引上选择二级索引?

mysql – 为什么在SELECT COUNT(*)…的聚簇索引上选择二级索引?

作者:互联网

在此查询中:

select count(*) from largetable;

选择二级指数:

mysql> explain select count(*) from largetable;
+----+-------------+------------+-------+---------------+------+---------+------+----------+-------------+
| id | select_type | table      | type  | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+------------+-------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | largetable | index | NULL          | iif  | 5       | NULL | 50000169 | Using index |
+----+-------------+------------+-------+---------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)

mysql> select count(*) from largetable;
+----------+
| count(*) |
+----------+
| 50000000 |
+----------+
1 row in set (5 min 52.02 sec)

强制使用聚集索引:

select count(*) from largetable force index (primary);

提供更好的表现:

mysql> explain select count(*) from largetable force index (primary);
+----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows     | Extra       |
+----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+
|  1 | SIMPLE      | largetable | index | NULL          | PRIMARY | 4       | NULL | 50000169 | Using index |
+----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+
1 row in set (0.00 sec)

mysql> select count(*) from largetable force index (primary);
+----------+
| count(*) |
+----------+
| 50000000 |
+----------+
1 row in set (2 min 23.07 sec)

所以这是5分52秒而不是2分23秒.

我想了解为什么MySQL的查询优化器选择二级索引.

表中有5000万行,其中ID为1到5千万(无间隙),是按顺序插入的.

这是在MySQL 5.5.11上.

这是表的设计:

create table largetable (
  id     int   primary key   auto_increment,
  field1 int,
    index iif (field1),
  ... some more columns, some with indexes ... each row is about 115 bytes ...
);

解决方法:

问题可能源于MySQL Query Optimizer的选择方式以及InnoDB内部表示索引的方式.

首先看一下索引的基数.主键的基数必须始终是InnoDB表的实际行数.现在,看看field1的基数.如果索引iif小于主键的索引,则MySQL查询优化器将选择二级索引.要验证field1的Cardinaliry是否较低,请运行以下查询:

SELECT COUNT(DISTINCT field1) FROM largetable;
SELECT field1,COUNT(1) fieldcount FROM largetable
GROUP BY field1 WITH ROLLUP;

现在,查看索引的内部表示.二级索引将包含两个项目:1)被索引的列值,2)来自聚集索引的rowid(a.k.a.gen_clust_index).每次在二级索引中引用列时,也会执行实际行的查找.想象一下:InnoDB中的每一行都有两个键.

将这两个问题放在一起,您会发现基数低于主键的二级索引仍将使用主键查找实际行.这解释了为什么选择二级索引而不是主键,并且查询需要两倍甚至更长时间.

Some people would disagree with this line of reasoning because I answered a question similar to this in StackOverflow (Nov 15, 2011). Although my answer was accepted, it has mixed upvotes and downvotes because some do not view the MySQL Query Optimizer and InnoDB index structure the same way.

如果Percona的任何人看到这个问题和我的答案,并且看到我的推理有任何缺陷,请纠正我,这样所有人都可以学习.

更新2012-04-23 12:56美国东部时间

InnoDB存储引擎深入了解BTREE索引,以便在基数上进行有根据的猜测.尝试关闭innodb_stats_on_metadata

[mysqld]
innodb_stats_on_metadata = 0

根据文档,当禁用时,InnoDB不会在这些操作期间更新统计信息.禁用此变量可以提高具有大量表或索引的模式的访问速度.它还可以提高涉及InnoDB表的查询的执行计划的稳定性.

标签:query-performance,mysql,index-tuning
来源: https://codeday.me/bug/20190806/1594672.html