数据库
首页 > 数据库> > mysql-优化查询从同一表提取的多列

mysql-优化查询从同一表提取的多列

作者:互联网

这是another question here on SO的跟进.

我有这两个数据库表(省略了更多表):

acquisitions (acq)
    id {PK}
    id_cu {FK}
    datetime
    { Unique Constraint: id_cu - datetime }

data
    id {PK}
    id_acq {FK acquisitions}
    id_meas
    id_elab
    value

所有可能的id和datetime都已编制索引.

当然,我不会改变数据库结构,我需要以这种方式提取数据:

>按日期时间分组的行
>每列为选定的acq.id_cu-data.id_meas-data.id_elab组合对应的data.value. (请参阅帖子底部的注释)
>如果日期时间内某些列的数据丢失但其他列的数据存在,则允许为空

我当前的查询是通过这种方式构建的(请参见SO question):

SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1

UNION

SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6

UNION

SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8

) AS T
WHERE datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

这里只检索3列,但正如我所说,列通常超过50列.

它可以完美运行,但是我想知道是否可以优化速度.

这是上述查询的MySQL EXPLAIN EXTENDED:

+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
| id | select_type  | table        | type | possible_keys                                  | key                   | key_len | ref                    | rows  | filtered | Extra                                        |
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
|  1 | PRIMARY      | <derived2>   | ALL  | NULL                                           | NULL                  | NULL    | NULL                   | 82466 |   100.00 | Using where; Using temporary; Using filesort |
|  2 | DERIVED      | acquisitions | ref  | PRIMARY,id_cu,ix_acquisitions_id_cu            | id_cu                 | 4       |                        | 18011 |   100.00 |                                              |
|  2 | DERIVED      | data         | ref  | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq        | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                                  |
|  3 | UNION        | acquisitions | ref  | PRIMARY,id_cu,ix_acquisitions_id_cu            | ix_acquisitions_id_cu | 4       |                        | 20864 |   100.00 |                                              |
|  3 | UNION        | data         | ref  | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq        | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                                  |
|  4 | UNION        | acquisitions | ref  | PRIMARY,id_cu,ix_acquisitions_id_cu            | id_cu                 | 4       |                        | 31848 |   100.00 |                                              |
|  4 | UNION        | data         | ref  | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq        | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                                  |
| NULL | UNION RESULT | <union2,3,4> | ALL  | NULL                                           | NULL                  | NULL    | NULL                   |  NULL |     NULL |                                              |
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
8 rows in set, 1 warning (8.24 sec)

当前(进行编辑:今天检查)有39万次采集和920万个数据值(并且还在增长),大约需要10分钟来提取59列的表格.我知道,先例软件最多需要1个小时才能提取数据.

感谢您耐心阅读,直到这里:)

更新资料

在Denis回答之后,我尝试了他的更改1.和2.,这是新查询的结果:

SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

UNION ALL

SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

UNION ALL

SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

) AS T GROUP BY datetime

在这里新的扩展说明:

+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| id | select_type  | table        | type  | possible_keys                                                | key            | key_len | ref                    | rows  | filtered | Extra                           |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
|  1 | PRIMARY      | <derived2>   | ALL   | NULL                                                         | NULL           | NULL    | NULL                   | 51997 |   100.00 | Using temporary; Using filesort |
|  2 | DERIVED      | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 14827 |   100.00 | Using where                     |
|  2 | DERIVED      | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
|  3 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 18663 |   100.00 | Using where                     |
|  3 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
|  4 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 13260 |   100.00 | Using where                     |
|  4 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
| NULL | UNION RESULT | <union2,3,4> | ALL   | NULL                                                         | NULL           | NULL    | NULL                   |  NULL |     NULL |                                 |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
8 rows in set, 1 warning (3.01 sec)

毫无疑问,在表演方面取得了不错的成绩

更新(2)

这是添加点3.

EXPLAIN EXTENDED SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

SELECT acquisitions.datetime AS datetime, MAX(data.value) AS v1, NULL AS v2, NULL AS v3 
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 1 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

UNION ALL

SELECT acquisitions.datetime AS datetime, NULL AS v1, MAX(data.value) AS v2, NULL AS v3 
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 4 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

UNION ALL

SELECT acquisitions.datetime AS datetime, NULL AS v1, NULL AS v2, MAX(data.value) AS v3 
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 8 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

) AS T GROUP BY datetime;

这是EXPLAIN EXTENDED的结果

+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| id | select_type  | table        | type  | possible_keys                                                | key            | key_len | ref                    | rows  | filtered | Extra                           |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
|  1 | PRIMARY      | <derived2>   | ALL   | NULL                                                         | NULL           | NULL    | NULL                   | 51997 |   100.00 | Using temporary; Using filesort |
|  2 | DERIVED      | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 14827 |   100.00 | Using where                     |
|  2 | DERIVED      | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
|  3 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 18663 |   100.00 | Using where                     |
|  3 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
|  4 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 13260 |   100.00 | Using where                     |
|  4 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
| NULL | UNION RESULT | <union2,3,4> | ALL   | NULL                                                         | NULL           | NULL    | NULL                   |  NULL |     NULL |                                 |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
8 rows in set, 1 warning (3.06 sec)

只是慢一点,是否应该从大量的同乐中受益?我会试试看…

更新(3)

我尝试使用和不使用MAX(data.value)… GROUP BY datetime,并且在60列查询中获得更好的结果.结果因尝试而异,这是其中之一.

>原始查询9m12.144s
>与丹尼斯的1.和2.4m6.597s
>与Denis的1.,2和3. 4m0.210s

所需时间减少了约57%.

更新(4)

我尝试了Andiry解决方案,但是它比Denis优化要慢得多.

在3个组合/列上进行了测试:

>未优化:1立方米
> Denis优化:1.7秒
>安迪(Andiry)的案例:9.3秒

我还测试了12个组合/列:

>未优化:未经测试
> Denis优化:3.6秒
>安迪(Andiry)的案例:13.7秒

此外,安迪里(Andiry)的解决方案还可以获取获取日期,在该日期中,任何选定组合都没有数据,而其他组合则存在.

想象控制单元1每30分钟在:00和:30获取数据,而控制单元2在:15和:45:我将行数加倍,并用NULL填充空行.

注意:

一切都与传感器系统有关:有多个控制单元(每个id_cu一个),每个传感器都有许多传感器.

一个传感器由一对id_cu / id_meas标识,并针对每种度量发送不同的详细信息,例如MIN(id_elab = 1),MAX(id_elab = 2),AVERAGE(id_elab = 3),即时(id_elab = …)等,每个id_elab一个.

用户可以自由选择自己想要的详细信息,例如:

>结果列的控制单元#1的传感器#3的平均值(3),因此id_cu = 1 / id_meas = 3 / id_elab = 3
>结果列的控制单元#1的传感器#5的平均值(3),因此id_cu = 1 / id_meas = 5 / id_elab = 3
>另一列的控制单元#4的传感器#2的最小值(1),因此id_cu = 4 / id_meas = 2 / id_elab = 1
>(输入任何有效的id_cu,id_meas,id_elab组合)
> …

等等,多达数十种选择…

这是部分DDL(不相关的表):

CREATE TABLE acquisitions (
    id INTEGER NOT NULL AUTO_INCREMENT, 
    id_cu INTEGER NOT NULL, 
    datetime DATETIME NOT NULL, 
    PRIMARY KEY (id), 
    UNIQUE (id_cu, datetime), 
    FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)

CREATE TABLE data (
    id INTEGER NOT NULL AUTO_INCREMENT, 
    id_acq INTEGER NOT NULL, 
    id_meas INTEGER NOT NULL, 
    id_elab INTEGER NOT NULL, 
    value FLOAT, 
    PRIMARY KEY (id), 
    FOREIGN KEY(id_acq) REFERENCES acquisitions (id) ON DELETE CASCADE
)

CREATE TABLE ctrl_units (
    id INTEGER NOT NULL, 
    name VARCHAR(40) NOT NULL, 
    PRIMARY KEY (id)
)

CREATE TABLE sensors (
    id_cu INTEGER NOT NULL, 
    id_meas INTEGER NOT NULL, 
    id_elab INTEGER NOT NULL, 
    name VARCHAR(40) NOT NULL, 
    `desc` VARCHAR(80), 
    PRIMARY KEY (id_cu, id_meas), 
    FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)

解决方法:

存在三个主要问题:

>使用全部工会,而不是工会.您正在分组和获取最小值/最大值,因此引入删除重复行的排序步骤毫无意义.
> where子句可以放在每个联合子语句中:

select ...
from (
select ... from ...  where ...
union all
select ... from ...  where ...
union all
...
)
group by ...

编写方式的开始是获取所有行,将所有行追加,最后过滤所需的行.在union子语句中注入where子句将使其仅获取所需的行,最后将所有行追加.
>按照相同的方式,预先汇总聚合:

select ..., max(foo) as foo
from (
select ..., max(foo) as foo from ...  where ... group by ...
union all
select ..., max(foo) as foo from ...  where ... group by ...
union all
...
)
group by ...

优化程序将更好地利用现有索引,最终您将仅追加几行,而不是几百万行.

标签:query-optimization,sql,mysql
来源: https://codeday.me/bug/20191102/1993019.html