数据库
首页 > 数据库> > 03 mysql索引优化-tuling

03 mysql索引优化-tuling

作者:互联网

st:2020年12月15日
et:2020年12月17日

01 mysql如何选择合适的索引

# employees表的数据结构如下:
mysql> select * from employees order by id limit 10;
+----+-----------+-----+----------+---------------------+
| id | name      | age | position | hire_time           |
+----+-----------+-----+----------+---------------------+
|  4 | LiLei     |  22 | manager  | 2020-12-14 21:08:18 |
|  5 | HanMeimei |  23 | dev      | 2020-12-14 21:08:18 |
|  6 | Lucy      |  23 | dev      | 2020-12-14 21:08:18 |
|  7 | user7     |  21 | dev      | 2020-12-15 20:46:20 |
|  8 | user8     |  28 | dev      | 2020-12-15 20:46:20 |
|  9 | user9     |  17 | dev      | 2020-12-15 20:46:20 |
| 10 | user10    |  23 | dev      | 2020-12-15 20:46:20 |
| 11 | user11    |  29 | dev      | 2020-12-15 20:46:20 |
| 12 | user12    |  32 | dev      | 2020-12-15 20:46:20 |
| 13 | user13    |  21 | dev      | 2020-12-15 20:46:20 |
+----+-----------+-----+----------+---------------------+
10 rows in set (0.00 sec)
mysql> select * from employees order by id desc limit 3;     
+--------+------------+-----+----------+---------------------+
| id     | name       | age | position | hire_time           |
+--------+------------+-----+----------+---------------------+
| 432828 | user432828 |  31 | dev      | 2020-12-15 20:58:34 |
| 432827 | user432827 |  25 | dev      | 2020-12-15 20:58:34 |
| 432826 | user432826 |  57 | dev      | 2020-12-15 20:58:34 |
+--------+------------+-----+----------+---------------------+
3 rows in set (0.00 sec)
# 表总共432825行,满足以下查询条件的有432825行,而rows=432390,这是一个估计值。 
mysql> explain select * from employees where name > 'a';
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys         | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | idx_name_age_position | NULL | NULL    | NULL | 432390 |    50.00 | Using where |
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

如果用name索引需要遍历name字段联合索引树,然后还需要根据遍历出来的主键值去主键索引树里再去查出最终数据,成本比全表扫描还高,可以用覆盖索引优化,这样只需要遍历name字段的联合索引树就能拿到所有结果,如下:

note:结合InnoDB索引的主键索引和非主键索引B+Tree来思考,以上sql是查了非主键索引B+Tree,得到了主键索引,还要去主键索引的B+Tree里面查select*中除联合索引(name、age、position)之外的hire_time字段。

mysql> explain select name, age, position from employees where name > 'a';
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 74      | NULL | 216195 |   100.00 | Using where; Using index |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

为什么如下情况又会走索引?因为usf大于use*(同理,v也可以,因为v大于u*)。

mysql> explain select * from employees where name > 'usf';
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 74      | NULL |    1 |   100.00 | Using index condition |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

对于上面这两种name>'a'和name>'usf'的执行结果,mysql最终是否选择走索引或者一张表涉及多个索引,mysql最终如何选择索引,我们可以用trace工具来一查究竟,开启trace工具会影响mysql性能,所以只能临时分析sql使用,用完之后立即关闭。

(1.1) trace工具用法

# 开启trace
mysql> set session optimizer_trace="enabled=on",end_markers_in_json=on; 

mysql> select * from employees where name > 'a' order by position;
mysql> SELECT * FROM information_schema.OPTIMIZER_TRACE;

# 分析完后立即关闭trace
mysql> set session optimizer_trace="enabled=off";
{
  "steps": [
    {
      // 第1阶段:sql准备阶段
      "join_preparation": { 
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `employees`.`id` AS `id`,`employees`.`name` AS `name`,`employees`.`age` AS `age`,`employees`.`position` AS `position`,`employees`.`hire_time` AS `hire_time` from `employees` where (`employees`.`name` > 'a') order by `employees`.`position`"
          }
        ] /* steps */
      } /* join_preparation */
    },
    {
      // 第2阶段:sql优化阶段
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            // 条件处理
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "(`employees`.`name` > 'a')",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                }
              ] /* steps */
            } /* condition_processing */
          },
          {
            "substitute_generated_columns": {
            } /* substitute_generated_columns */
          },
          {
            // 表依赖详情
            "table_dependencies": [
              {
                "table": "`employees`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              }
            ] /* table_dependencies */
          },
          {
            "ref_optimizer_key_uses": [
            ] /* ref_optimizer_key_uses */
          },
          {
            // 预估表的访问成本
            "rows_estimation": [
              {
                "table": "`employees`",
                "range_analysis": {
                  // 全表扫描
                  "table_scan": {
                    // 扫描行数
                    "rows": 432390,
                    // 查询成本
                    "cost": 87795
                  } /* table_scan */,
                  // 查询表可能使用的索引
                  "potential_range_indexes": [
                    {
                      // 主键索引
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      // 辅助索引
                      "index": "idx_name_age_position",
                      "usable": true,
                      "key_parts": [
                        "name",
                        "age",
                        "position",
                        "id"
                      ] /* key_parts */
                    }
                  ] /* potential_range_indexes */,
                  "setup_range_conditions": [
                  ] /* setup_range_conditions */,
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  } /* group_index_range */,
                  // 分析各个索引使用成本
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "idx_name_age_position",
                        // 索引使用范围
                        "ranges": [
                          "a < name"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        // 使用该索引获取的记录是否按照主键排序
                        "rowid_ordered": false,
                        "using_mrr": false,
                        // 是否使用覆盖索引
                        "index_only": false,
                        // 索引扫描行数
                        "rows": 216195,
                        // 索引使用成本
                        "cost": 259435,
                        // 是否选择该索引
                        "chosen": false,
                        "cause": "cost"
                      }
                    ] /* range_scan_alternatives */,
                    "analyzing_roworder_intersect": {
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                    } /* analyzing_roworder_intersect */
                  } /* analyzing_range_alternatives */
                } /* range_analysis */
              }
            ] /* rows_estimation */
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`employees`",
                // 最优访问路径
                "best_access_path": {
                  // 最终选择的访问路径
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 432390,
                      // 访问类型:为scan,全表扫描
                      "access_type": "scan",
                      "resulting_rows": 432390,
                      "cost": 87793,
                      // 确定选择
                      "chosen": true,
                      "use_tmp_table": true
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "condition_filtering_pct": 100,
                "rows_for_plan": 432390,
                "cost_for_plan": 87793,
                "sort_cost": 432390,
                "new_cost_for_plan": 520183,
                "chosen": true
              }
            ] /* considered_execution_plans */
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "(`employees`.`name` > 'a')",
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [
                {
                  "table": "`employees`",
                  "attached": "(`employees`.`name` > 'a')"
                }
              ] /* attached_conditions_summary */
            } /* attaching_conditions_to_tables */
          },
          {
            "clause_processing": {
              "clause": "ORDER BY",
              "original_clause": "`employees`.`position`",
              "items": [
                {
                  "item": "`employees`.`position`"
                }
              ] /* items */,
              "resulting_clause_is_simple": true,
              "resulting_clause": "`employees`.`position`"
            } /* clause_processing */
          },
          {
            "reconsidering_access_paths_for_index_ordering": {
              "clause": "ORDER BY",
              "index_order_summary": {
                "table": "`employees`",
                "index_provides_order": false,
                "order_direction": "undefined",
                "index": "unknown",
                "plan_changed": false
              } /* index_order_summary */
            } /* reconsidering_access_paths_for_index_ordering */
          },
          {
            "refine_plan": [
              {
                "table": "`employees`"
              }
            ] /* refine_plan */
          }
        ] /* steps */
      } /* join_optimization */
    },
    {
      // 第3阶段:sql执行阶段
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "filesort_information": [
              {
                "direction": "asc",
                "table": "`employees`",
                "field": "position"
              }
            ] /* filesort_information */,
            "filesort_priority_queue_optimization": {
              "usable": false,
              "cause": "not applicable (no LIMIT)"
            } /* filesort_priority_queue_optimization */,
            "filesort_execution": [
            ] /* filesort_execution */,
            "filesort_summary": {
              "rows": 432825,
              "examined_rows": 432825,
              "number_of_tmp_files": 127,
              "sort_buffer_size": 262056,
              "sort_mode": "<sort_key, packed_additional_fields>"
            } /* filesort_summary */
          }
        ] /* steps */
      } /* join_execution */
    }
  ] /* steps */
}

结论:全表扫描的成本低于索引扫描,所以mysql最终选择全表扫描。

# 开启trace
mysql> set session optimizer_trace="enabled=on",end_markers_in_json=on; 

mysql> select * from employees where name > 'usf' order by position;
mysql> SELECT * FROM information_schema.OPTIMIZER_TRACE;
 
分析完后立即关闭trace
mysql> set session optimizer_trace="enabled=off";

查看trace字段可知索引扫描的成本低于全表扫描,所以mysql最终选择索引扫描。(trace记录如下:)

{
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `employees`.`id` AS `id`,`employees`.`name` AS `name`,`employees`.`age` AS `age`,`employees`.`position` AS `position`,`employees`.`hire_time` AS `hire_time` from `employees` where (`employees`.`name` > 'usf') order by `employees`.`position`"
          }
        ] /* steps */
      } /* join_preparation */
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "(`employees`.`name` > 'usf')",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "(`employees`.`name` > 'usf')"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "(`employees`.`name` > 'usf')"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "(`employees`.`name` > 'usf')"
                }
              ] /* steps */
            } /* condition_processing */
          },
          {
            "substitute_generated_columns": {
            } /* substitute_generated_columns */
          },
          {
            "table_dependencies": [
              {
                "table": "`employees`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              }
            ] /* table_dependencies */
          },
          {
            "ref_optimizer_key_uses": [
            ] /* ref_optimizer_key_uses */
          },
          {
            "rows_estimation": [
              {
                "table": "`employees`",
                "range_analysis": {
                  "table_scan": {
                    "rows": 432390,
                    "cost": 87795
                  } /* table_scan */,
                  "potential_range_indexes": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "idx_name_age_position",
                      "usable": true,
                      "key_parts": [
                        "name",
                        "age",
                        "position",
                        "id"
                      ] /* key_parts */
                    }
                  ] /* potential_range_indexes */,
                  "setup_range_conditions": [
                  ] /* setup_range_conditions */,
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  } /* group_index_range */,
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "idx_name_age_position",
                        "ranges": [
                          "usf < name"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 1,
                        "cost": 2.21,
                        "chosen": true
                      }
                    ] /* range_scan_alternatives */,
                    "analyzing_roworder_intersect": {
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                    } /* analyzing_roworder_intersect */
                  } /* analyzing_range_alternatives */,
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "idx_name_age_position",
                      "rows": 1,
                      "ranges": [
                        "usf < name"
                      ] /* ranges */
                    } /* range_access_plan */,
                    "rows_for_plan": 1,
                    "cost_for_plan": 2.21,
                    "chosen": true
                  } /* chosen_range_access_summary */
                } /* range_analysis */
              }
            ] /* rows_estimation */
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`employees`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 1,
                      "access_type": "range",
                      "range_details": {
                        "used_index": "idx_name_age_position"
                      } /* range_details */,
                      "resulting_rows": 1,
                      "cost": 2.41,
                      "chosen": true,
                      "use_tmp_table": true
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "condition_filtering_pct": 100,
                "rows_for_plan": 1,
                "cost_for_plan": 2.41,
                "sort_cost": 1,
                "new_cost_for_plan": 3.41,
                "chosen": true
              }
            ] /* considered_execution_plans */
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "(`employees`.`name` > 'usf')",
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [
                {
                  "table": "`employees`",
                  "attached": "(`employees`.`name` > 'usf')"
                }
              ] /* attached_conditions_summary */
            } /* attaching_conditions_to_tables */
          },
          {
            "clause_processing": {
              "clause": "ORDER BY",
              "original_clause": "`employees`.`position`",
              "items": [
                {
                  "item": "`employees`.`position`"
                }
              ] /* items */,
              "resulting_clause_is_simple": true,
              "resulting_clause": "`employees`.`position`"
            } /* clause_processing */
          },
          {
            "reconsidering_access_paths_for_index_ordering": {
              "clause": "ORDER BY",
              "index_order_summary": {
                "table": "`employees`",
                "index_provides_order": false,
                "order_direction": "undefined",
                "index": "idx_name_age_position",
                "plan_changed": false
              } /* index_order_summary */
            } /* reconsidering_access_paths_for_index_ordering */
          },
          {
            "refine_plan": [
              {
                "table": "`employees`",
                "pushed_index_condition": "(`employees`.`name` > 'usf')",
                "table_condition_attached": null
              }
            ] /* refine_plan */
          }
        ] /* steps */
      } /* join_optimization */
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "filesort_information": [
              {
                "direction": "asc",
                "table": "`employees`",
                "field": "position"
              }
            ] /* filesort_information */,
            "filesort_priority_queue_optimization": {
              "usable": false,
              "cause": "not applicable (no LIMIT)"
            } /* filesort_priority_queue_optimization */,
            "filesort_execution": [
            ] /* filesort_execution */,
            "filesort_summary": {
              "rows": 0,
              "examined_rows": 0,
              "number_of_tmp_files": 0,
              "sort_buffer_size": 262056,
              "sort_mode": "<sort_key, packed_additional_fields>"
            } /* filesort_summary */
          }
        ] /* steps */
      } /* join_execution */
    }
  ] /* steps */
}

02 常见sql深入优化

(2.1) order by与group by优化

(1) case:explain select * from employees where name = 'LiLei' and position = 'manager' order by age;

mysql> explain select * from employees where name = 'LiLei' and position = 'manager' order by age;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 74      | const |    1 |    10.00 | Using index condition |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

利用最左前缀法则:中间字段不能断,因此查询用到了name索引,从key_len=74也能看出;age索引列用在排序过程中,因为Extra字段里没有using filesort

note:结合联合索引B+Tree来思考,通过name和position,已经圈定了一些符合要求的数据,再通过age进行排序,而在这个圈定的范围中,其实已经通过age进行排序过了,所以age索引列会用在排序过程中。

(2) case:explain select * from employees where name = 'LiLei' order by position;

mysql> explain select * from employees where name = 'LiLei' order by position;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra                                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 74      | const |    1 |   100.00 | Using index condition; Using filesort |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

从explain的执行结果来看:key_len=74,查询使用了name索引,由于用了position进行排序,跳过了age,出现了Using filesort

(3) case:explain select * from employees where name = 'LiLei' order by age, position;

mysql> explain select * from employees where name = 'LiLei' order by age, position;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 74      | const |    1 |   100.00 | Using index condition |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

查找只用到索引name,age和position用于排序,无Using filesort

(4) case:explain select * from employees where name = 'LiLei' order by position, age;

mysql> explain select * from employees where name = 'LiLei' order by position, age;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra                                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 74      | const |    1 |   100.00 | Using index condition; Using filesort |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

和case(3)中explain的执行结果一样,但是出现了Using filesort,因为索引的创建顺序为name,age,position,但是排序的时候age和position颠倒位置了。

(5) case:explain select * from employees where name = 'LiLei' and age = 22 order by position, age;

mysql> explain select * from employees where name = 'LiLei' and age = 22 order by position, age;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-----------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref         | rows | filtered | Extra                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 78      | const,const |    1 |   100.00 | Using index condition |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

与case(4)对比,在Extra中并未出现Using filesort,因为age为常量,在排序中被优化,所以索引未颠倒,不会出现Using filesort

(6) case:explain select * from employees where name = 'LiLei' order by age asc, position desc;

mysql> explain select * from employees where name = 'LiLei' order by age asc, position desc;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra                                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 74      | const |    1 |   100.00 | Using index condition; Using filesort |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

虽然排序的字段列与索引顺序一样,且order by默认升序,这里position desc变成了降序,导致与索引的
排序方式不同,从而产生Using filesort。mysql8以上版本有降序索引可以支持该种查询方式。

(7) case:explain select * from employees where name in ('LiLei','user10') order by age, position;

mysql> explain select * from employees where name in ('LiLei','user10') order by age, position;
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 74      | NULL |    2 |   100.00 | Using index condition; Using filesort |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.01 sec)

对于排序来说,多个相等条件也是范围查询。

(8) case:explain select * from employees where name > 'a' order by name;

mysql> explain select * from employees where name > 'a' order by name;
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-----------------------------+
| id | select_type | table     | partitions | type | possible_keys         | key  | key_len | ref  | rows   | filtered | Extra                       |
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-----------------------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | idx_name_age_position | NULL | NULL    | NULL | 432390 |    50.00 | Using where; Using filesort |
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)

# 可以用覆盖索引优化
mysql> explain select name, age, position from employees where name > 'a' order by name;
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 74      | NULL | 216195 |   100.00 | Using where; Using index |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

(2.2) order by优化总结

  1. mysql支持两种方式的排序filesortindex,Using index是指mysql扫描索引本身完成排序。index效率高,filesort效率低。
  2. order by满足两种情况会使用Using index。
    • (1) order by语句使用索引最左前缀法则。
    • (2) 使用where子句与order by子句条件列组合满足索引最左前缀法则。
  3. 尽量在索引列上完成排序,遵循索引建立(索引创建的顺序)时的最左前缀法则。
  4. 如果order by的条件不在索引列上,就会产生Using filesort。
  5. 能用覆盖索引尽量用覆盖索引。
  6. group by与order by很类似,其实质是先排序后分组,遵照索引创建顺序的最左前缀法则。对于group by的优化如果不需要排序的可以加上order by null禁止排序。注意,where高于having,能写在where中的限定条件就不要去having限定了。

(2.3) Using filesort文件排序原理详解

filesort文件排序方式:(针对filesort排序不能优化到index排序的优化思路)

(1) 单路排序:是一次性取出满足条件行的所有字段,然后在sort buffer中进行排序;用trace工具可以看到sort_mode信息里显示<sort_key, additional_fields>或者<sort_key, packed_additional_fields>

note:< sort_key, additional_fields >,sort_key:排序字段;additional_fields:其他字段。

(2) 双路排序(又叫回表排序模式):是首先根据相应的条件取出相应的排序字段可以直接定位行数据的字段(如primary id或unique index),然后在sort buffer中进行排序,排序完后需要再次取回其他需要的字段;用trace工具可以看到sort_mode信息里显示<sort_key, rowid>

note:< sort_key, rowid >,sort_key:排序字段;rowid:可以直接定位行的字段。

mysql通过比较系统变量max_length_for_sort_data(默认1024字节)的大小和需要查询的字段总大小来判断使用哪种排序模式。(需要查询的字段总大小:比如name,age,position,则总大小为140。)

(1) 如果max_length_for_sort_data比查询字段的总长度大,那么使用单路排序模式;

(2) 如果max_length_for_sort_data比查询字段的总长度小,那么使用双路排序模式。

set session optimizer_trace="enabled=on",end_markers_in_json=on; 

select * from employees where name = 'user' order by position;
SELECT * FROM information_schema.OPTIMIZER_TRACE;

{
  // sql执行阶段
  "join_execution": {
    "select#": 1,
    "steps": [
      {
        "filesort_information": [
          {
            "direction": "asc",
            "table": "`employees`",
            "field": "position"
          }
        ] /* filesort_information */,
        "filesort_priority_queue_optimization": {
          "usable": false,
          "cause": "not applicable (no LIMIT)"
        } /* filesort_priority_queue_optimization */,
        "filesort_execution": [
        ] /* filesort_execution */,
        // 文件排序信息
        "filesort_summary": {
          // 预估扫描行数
          "rows": 0,
          // 参数排序的行
          "examined_rows": 0,
          // 临时使用文件的个数,这里值为0代表全部使用sort_buffer内存排序,否则使用的磁盘文件排序
          "number_of_tmp_files": 0,
          // 排序缓存的大小(即sort_buffer的大小)
          "sort_buffer_size": 262056,
          // 排序方式,这里使用单路排序
          "sort_mode": "<sort_key, packed_additional_fields>"
        } /* filesort_summary */
      }
    ] /* steps */
  } /* join_execution */
}
set max_length_for_sort_data = 10

select * from employees where name = 'user' order by position;
SELECT * FROM information_schema.OPTIMIZER_TRACE;

set session optimizer_trace="enabled=off";
{
  "join_execution": {
    "select#": 1,
    "steps": [
      {
        "filesort_information": [
          {
            "direction": "asc",
            "table": "`employees`",
            "field": "position"
          }
        ] /* filesort_information */,
        "filesort_priority_queue_optimization": {
          "usable": false,
          "cause": "not applicable (no LIMIT)"
        } /* filesort_priority_queue_optimization */,
        "filesort_execution": [
        ] /* filesort_execution */,
        "filesort_summary": {
          "rows": 0,
          "examined_rows": 0,
          "number_of_tmp_files": 0,
          "sort_buffer_size": 262136,
          // 双路排序
          "sort_mode": "<sort_key, rowid>"
        } /* filesort_summary */
      }
    ] /* steps */
  } /* join_execution */
}
  1. 从索引name找到第一个满足name = 'user'条件的主键id;
  2. 根据主键id取出整行,取出所有字段的值,存入sort_buffer中;
  3. 从索引name找到下一个满足name = 'user'条件的主键id;
  4. 重复步骤2、3直到不满足name = 'user';
  5. 对sort_buffer中的数据按照字段position进行排序;
  6. 返回结果给客户端;
  1. 从索引name找到第一个满足name = 'user'的主键id;
  2. 根据主键id取出整行,把排序字段position和主键id这两个字段放到sort buffer中;
  3. 从索引name取下一个满足name = 'user'记录的主键id;
  4. 重复3、4直到不满足name = 'user';
  5. 对sort_buffer中的字段position和主键id按照字段position进行排序;
  6. 遍历排序好的id和字段position,按照id的值回到原表中取出所有字段的值返回给客户端;

其实对比两个排序模式,单路排序会把所有需要查询的字段都放到sort buffer中,而双路排序只会把主键和需要排序的字段放到sort buffer中进行排序,然后再通过主键回到原表查询需要的字段。

如果mysql中排序内存配置的比较小(即内存小导致配的sort_buffer小)并且没有条件继续增加了,可以适当把max_length_for_sort_data配置小点,让优化器选择使用双路排序算法,可以在sort_buffer中一次排序更多的行,只是需要再根据主键回到原表取数据。

如果mysql排序内存有条件可以配置比较大,可以适当增大max_length_for_sort_data的值,让优化器优先选择全字段排序(单路排序),把需要的字段放到sort_buffer中,这样排序后就会直接从内存里返回查询结果了。

所以,mysql通过max_length_for_sort_data这个参数来控制排序,在不同场景使用不同的排序模式,从而提升排序效率。(即通过调整这个参数,来走sort_buffer排序,在内存排序,从而提升排序效率。)

note:如果全部使用sort_buffer内存排序一般情况下效率会高于磁盘文件排序,但不能因为这个就随便增大sort_buffer(默认1M),mysql很多参数设置都是做过优化的,不要轻易调整。

(2.4) 分页查询优化

很多时候我们业务系统实现分页功能可能会用如下sql实现:

select * from employees limit 10000, 10;

表示从表employees中取出从10001行开始的10行记录。看似只查询了10条记录,实际这条SQL是先读取10010条记录,然后抛弃前10000条记录,然后读到后面10条想要的数据。因此要查询一张大表比较靠后的数据,执行效率是非常低的。

(1) 根据自增且连续的主键排序的分页查询

mysql> select * from employees limit 400000, 5;
+--------+------------+-----+----------+---------------------+
| id     | name       | age | position | hire_time           |
+--------+------------+-----+----------+---------------------+
| 400004 | user400004 |  30 | dev      | 2020-12-15 20:57:38 |
| 400005 | user400005 |  84 | dev      | 2020-12-15 20:57:38 |
| 400006 | user400006 |  28 | dev      | 2020-12-15 20:57:38 |
| 400007 | user400007 |  31 | dev      | 2020-12-15 20:57:38 |
| 400008 | user400008 |  16 | dev      | 2020-12-15 20:57:38 |
+--------+------------+-----+----------+---------------------+
5 rows in set (0.10 sec)

mysql> explain select * from employees limit 400000, 5;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 432390 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
1 row in set, 1 warning (0.00 sec)

该sql表示查询从第400001开始的五行数据,没添加单独order by,表示通过主键排序。我们再看表employees,如果主键是自增并且连续的,所以可以改写成按照主键去查询从第400001开始的五行数据,如下:

mysql> select * from employees where id > 400000 limit 5;
+--------+------------+-----+----------+---------------------+
| id     | name       | age | position | hire_time           |
+--------+------------+-----+----------+---------------------+
| 400001 | user400001 |  28 | dev      | 2020-12-15 20:57:38 |
| 400002 | user400002 |  19 | dev      | 2020-12-15 20:57:38 |
| 400003 | user400003 |  95 | dev      | 2020-12-15 20:57:38 |
| 400004 | user400004 |  30 | dev      | 2020-12-15 20:57:38 |
| 400005 | user400005 |  84 | dev      | 2020-12-15 20:57:38 |
+--------+------------+-----+----------+---------------------+
5 rows in set (0.00 sec)
mysql> explain select * from employees where id > 400000 limit 5;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL | 66638 |   100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

显然改写后的sql走了索引,而且扫描的行数大大减少,执行效率更高。但是,这条改写的sql在很多场景并不实用,因为表中可能某些记录被删后,主键空缺,导致结果不一致。(如上结果就不一致)

两条sql的结果并不一样,因此,如果主键不连续,不能使用上面描述的优化方法。另外如果原sql是order by非主键的字段,按照上面说的方法改写会导致两条sql的结果不一致。所以这种改写得满足以下两个条件:

(2) 根据非主键字段排序的分页查询

再看一个根据非主键字段排序的分页查询,sql如下:

mysql> select * from employees order by name limit 400000, 5;
+-------+-----------+-----+----------+---------------------+
| id    | name      | age | position | hire_time           |
+-------+-----------+-----+----------+---------------------+
| 70456 | user70456 |  22 | dev      | 2020-12-15 20:48:20 |
| 70457 | user70457 |  26 | dev      | 2020-12-15 20:48:20 |
| 70458 | user70458 |  23 | dev      | 2020-12-15 20:48:20 |
| 70459 | user70459 |  12 | dev      | 2020-12-15 20:48:20 |
|  7046 | user7046  |  36 | dev      | 2020-12-15 20:46:33 |
+-------+-----------+-----+----------+---------------------+
5 rows in set (0.39 sec)

mysql> explain select * from employees order by name limit 400000, 5;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra          |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 432389 |   100.00 | Using filesort |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
1 row in set, 1 warning (0.00 sec)

发现并没有使用name字段的索引(key字段对应的值为NULL),具体原因是:扫描整个索引并查找到没索引
的行(可能要遍历多个索引树)的成本比扫描全表的成本更高,所以优化器放弃使用索引。

知道不走索引的原因,那么怎么优化呢?其实关键是让排序时返回的字段尽可能少(即尽可能使用覆盖索引,而不是select*),所以可以让排序和分页操作先查出主键(这个肯定会走主键索引、因为查询的是覆盖索引),然后根据主键查到对应的记录,sql改写如下:

# 虽然查了两次,查第1次后进行回表查第2次,但是效率还是要比上面的高。
mysql> select * from employees e inner join (select id from employees order by name limit 400000, 5) ed on e.id = ed.id;
+-------+-----------+-----+----------+---------------------+-------+
| id    | name      | age | position | hire_time           | id    |
+-------+-----------+-----+----------+---------------------+-------+
| 70456 | user70456 |  22 | dev      | 2020-12-15 20:48:20 | 70456 |
| 70457 | user70457 |  26 | dev      | 2020-12-15 20:48:20 | 70457 |
| 70458 | user70458 |  23 | dev      | 2020-12-15 20:48:20 | 70458 |
| 70459 | user70459 |  12 | dev      | 2020-12-15 20:48:20 | 70459 |
|  7046 | user7046  |  36 | dev      | 2020-12-15 20:46:33 |  7046 |
+-------+-----------+-----+----------+---------------------+-------+
5 rows in set (0.10 sec)

mysql> explain select * from employees e inner join (select id from employees order by name limit 400000, 5) ed on e.id = ed.id;
+----+-------------+------------+------------+--------+---------------+-----------------------+---------+-------+--------+----------+-------------+
| id | select_type | table      | partitions | type   | possible_keys | key                   | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+------------+------------+--------+---------------+-----------------------+---------+-------+--------+----------+-------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL                  | NULL    | NULL  | 400005 |   100.00 | NULL        |
|  1 | PRIMARY     | e          | NULL       | eq_ref | PRIMARY       | PRIMARY               | 4       | ed.id |      1 |   100.00 | NULL        |
|  2 | DERIVED     | employees  | NULL       | index  | NULL          | idx_name_age_position | 140     | NULL  | 400005 |   100.00 | Using index |
+----+-------------+------------+------------+--------+---------------+-----------------------+---------+-------+--------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

需要的结果与原sql一致,执行时间减少了一半以上,我们再对比优化前后sql的执行计划,原sql使用的是filesort排序,而优化后的sql使用的是index排序。

(2.5) join关联查询优化

创建实验表:(如果要组合字段,如user100,可以用CONCAT('user',100)

CREATE TABLE `t1`(
`id` int(11) NOT NULL AUTO_INCREMENT,
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

create table t2 like t1;

# 注意:删除存储过程procedure不需要后面的括号。
drop procedure if exists insert_t
delimiter ;;
create procedure insert_t()
begin
	declare i int;
	set i=1;
	while(i<=10000)do
		insert into t1(a,b) values(i,i);
		set i=i+1;
	end while;
end;;
delimiter ;;
call insert_t();

# t2表中插入100行数据,t1表中是插入了10000行数据。
drop procedure if exists insert_t
delimiter ;;
create procedure insert_t()
begin
	declare i int;
	set i=1;
	while(i<=100)do
		insert into t2(a,b) values(i,i);
		set i=i+1;
	end while;
end;;
delimiter ;;
call insert_t();

mysql的表关联常见有两种算法:

(1) 嵌套循环连接Nested-Loop Join(NLJ)算法

一次一行循环地从第一张表(称为驱动表)中读取行,在这行数据中取到关联字段,根据关联字段在另一张表(被驱动表)里取出满足条件的行,然后取出两张表的结果合集。

mysql> select * from t1 inner join t2 on t1.a = t2.a;
+-----+------+------+-----+------+------+
| id  | a    | b    | id  | a    | b    |
+-----+------+------+-----+------+------+
|   1 |    1 |    1 |   1 |    1 |    1 |
|   2 |    2 |    2 |   2 |    2 |    2 |
|   3 |    3 |    3 |   3 |    3 |    3 |
|  .. |   .. |   .. |  .. |   .. |   .. |
|  99 |   99 |   99 |  99 |   99 |   99 |
| 100 |  100 |  100 | 100 |  100 |  100 |
+-----+------+------+-----+------+------+

# 从上往下执行,先查t2表,再和t1表关联查询。(执行计划结果的id如果一样则按从上到下顺序执行sql)
mysql> explain select * from t1 inner join t2 on t1.a = t2.a;
+----+-------------+-------+------------+------+---------------+-------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+-------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | t2    | NULL       | ALL  | idx_a         | NULL  | NULL    | NULL        |  100 |   100.00 | Using where |
|  1 | SIMPLE      | t1    | NULL       | ref  | idx_a         | idx_a | 5       | tuling.t2.a |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+------+---------------+-------+---------+-------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

从执行计划中可以看到这些信息:

上面sql的大致流程如下:

整个过程会读取t2表的所有数据(扫描100行),然后遍历这每行数据中字段a的值,根据t2表中a的值索引扫描t1表中的对应行(扫描100次t1表的索引,1次扫描可以认为最终只扫描t1表一行完整数据,也就是总共t1表也扫描了100行)。因此整个过程扫描了200行。

Note:由于扫描t1表的索引很快,就把它看成扫描1次索引就能得到1行t1表中的数据;实际上B+Tree索引树的高度是2-4,所以确切的说应该是扫描索引200-400次。即总扫描了的次数为300-500。

如果被驱动表的关联字段没索引,使用NLJ算法性能会比较低,mysql会选择Block Nested-Loop Join算法。

(2) 基于块的嵌套循环连接Block Nested-Loop Join(BNL)算法

驱动表的数据读入到join_buffer中,然后扫描被驱动表,把被驱动表每一行取出来跟join_buffer中的数据做对比。

mysql>   explain select * from t1 inner join t2 on t1.b = t2.b;
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                                              |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------------------------------------------+
|  1 | SIMPLE      | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   100 |   100.00 | NULL                                               |
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10337 |    10.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

Extra中的Using join buffer(Block Nested Loop)说明该关联查询使用的是BNL算法。

上面sql的大致流程如下:

整个过程对表t1和t2都做了一次全表扫描,因此扫描的总行数为10000(表t1的数据总量)+100(表t2的数据总量)=10100行。并且join_buffer里的数据是无序的,因此对表t1中的每一行,都要做100次判断,所以内存中的判断次数是100*10000=100万次。

Question:被驱动表的关联字段没索引为什么要选择使用BNL算法而不使用NLJ呢?

如果上面第二条sql使用NLJ算法,那么扫描行数为100*10000=100万行,这个是磁盘扫描。而使用BNL算法,扫描行数是10100行,判断次数是100万。很显然,用BNL磁盘扫描次数少很多,相比于磁盘扫描,BNL内存计算会快得多。因此mysql对于被驱动表的关联字段没索引的关联查询,一般都会使用BNL算法。如果有索引一般选择NLJ算法,有索引的情况下NLJ算法比BNL算法性能更高

对于关联sql的优化:

straight_join解释:straight_join功能同join类似,但能让左边的表来驱动右边的表,能改表优化器对于联表查询的执行顺序。比如:select * from t2 straight_join t1 on t2.a = t1.a; 代表指定mysql选择t2表作为驱动表。

(2.6) in和exsits优化

原则:小表驱动大表,即小的数据集驱动大的数据集。

(1) in:当B表的数据集小于A表的数据集时,in优于exists。(即如下写法适用于该情况)

select * from A where id in (select id from B);

# 等价于,因为希望for循环的次数越小越好
for(select id from B){
    select * from A where A.id = B.id;
}

Note:以下两种情况,都是先查询的t2表,而不一定是先执行in()里面的sql。mysql在底层做了很多优化操作。

mysql> explain select * from t1 where id in (select id from t2);
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref          | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
|  1 | SIMPLE      | t2    | NULL       | index  | PRIMARY       | idx_a   | 5       | NULL         |  100 |   100.00 | Using index |
|  1 | SIMPLE      | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | tuling.t2.id |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

mysql> explain select * from t2 where id in (select id from t1);
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref          | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
|  1 | SIMPLE      | t2    | NULL       | ALL    | PRIMARY       | NULL    | NULL    | NULL         |  100 |   100.00 | NULL        |
|  1 | SIMPLE      | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | tuling.t2.id |    1 |   100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

(1) exists:当A表的数据集小于B表的数据集时,exists优于in。

将主查询A的数据,放到子查询B中做条件验证,根据验证结果(true或false)来决定主查询的数据是否保留:

select * from A where exists (select 1 from B where B.id = A.id);

# 等价于
for(select * from A){
    select * from B where B.id = A.id;
}
mysql> explain select * from t1 where exists (select 1 from t2 where t2.id = t1.id);
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+-------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref          | rows  | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+-------+----------+-------------+
|  1 | PRIMARY            | t1    | NULL       | ALL    | NULL          | NULL    | NULL    | NULL         | 10337 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | tuling.t1.id |     1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+-------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

mysql> explain select * from t2 where exists (select 1 from t1 where t2.id = t1.id);
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref          | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
|  1 | PRIMARY            | t2    | NULL       | ALL    | NULL          | NULL    | NULL    | NULL         |  100 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | tuling.t2.id |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

(2.7) count(*)查询优化

# 为了查看sql多次执行的真实时间,临时关闭mysql查询缓存
set global query_cache_size=0;
set global query_cache_type=0;
mysql> explain select count(1) from employees;
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key                   | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | idx_name_age_position | 140     | NULL | 432389 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(id) from employees;
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key                   | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | idx_name_age_position | 140     | NULL | 432389 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(name) from employees;
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key                   | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | idx_name_age_position | 140     | NULL | 432389 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from employees;
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key                   | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | idx_name_age_position | 140     | NULL | 432389 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

四个sql的执行计划一样,说明这四个sql执行效率应该差不多,区别在于根据某个字段count不会统计字段为null值的数据行。(即,使用count(name),如果name有空值,有n个空值,则查的结果会少n个。)

note:扫描非主键索引B+Tree的叶子节点,扫描到一个非null的,就加1。

如上,为什么mysql最终选择辅助索引而不是主键聚集索引?因为二级索引相对主键索引存储数据更少,检索性能应该更高。所以count(name)效率可以高于count(id)。

count(1) > count(name) ≈ count(*) > count(id);5.7版本推荐使用count(*)

常见优化方法:

(1) 查询mysql自己维护的总行数对于MyISAM存储引擎的表做不带where条件的count查询性能是很高的,因为MyISAM存储引擎的表的总行数会被mysql存储在磁盘上,查询不需要计算。

# 创建一个和t2表一样的t3表,并插入100行数据。
mysql> create table t3 like t2;
mysql> alter table t3 engine='MyISAM';

mysql> select count(*) from t3;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.00 sec)

mysql> explain select count(*) from t3;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

对于InnoDB存储引擎的表mysql不会存储表的总记录行数,查询count需要实时计算。

(2) show table status

如果只需要知道表总行数的估计值可以用如下sql查询,性能很高。

mysql> select count(*) from employees;
+----------+
| count(*) |
+----------+
|   432824 |
+----------+

# 只是估计值,Rows。
mysql> show table status like 'employees';
+-----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+-----------------+
| Name      | Engine | Version | Row_format | Rows   | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation       | Checksum | Create_options | Comment         |
+-----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+-----------------+
| employees | InnoDB |      10 | Dynamic    | 432389 |             49 |    21544960 |               0 |     22626304 |   9437184 |         432829 | 2020-12-15 21:21:40 | 2020-12-16 20:22:50 | NULL       | utf8_general_ci |     NULL |                | 员工记录表      |
+-----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+-----------------+
1 row in set (0.01 sec)

(3) 将总数维护到redis里

插入或删除表数据行的时候同时维护redis里的表总行数key的计数值(用incr或decr命令),但是这种方式可能不准,很难保证表操作和redis操作的事务一致性。

(4) 增加计数表

插入或删除表数据行的时候同时维护计数表,让它们在同一个事务里操作。

标签:03,name,employees,mysql,position,NULL,id,select,tuling
来源: https://www.cnblogs.com/ddhhdd/p/14152616.html