mysql-优化查询从同一表提取的多列
作者:互联网
这是another question here on SO的跟进.
我有这两个数据库表(省略了更多表):
acquisitions (acq)
id {PK}
id_cu {FK}
datetime
{ Unique Constraint: id_cu - datetime }
data
id {PK}
id_acq {FK acquisitions}
id_meas
id_elab
value
所有可能的id和datetime都已编制索引.
当然,我不会改变数据库结构,我需要以这种方式提取数据:
>按日期时间分组的行
>每列为选定的acq.id_cu-data.id_meas-data.id_elab组合对应的data.value. (请参阅帖子底部的注释)
>如果日期时间内某些列的数据丢失但其他列的数据存在,则允许为空
我当前的查询是通过这种方式构建的(请参见SO question):
SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
UNION
SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
UNION
SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
) AS T
WHERE datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
这里只检索3列,但正如我所说,列通常超过50列.
它可以完美运行,但是我想知道是否可以优化速度.
这是上述查询的MySQL EXPLAIN EXTENDED:
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 82466 | 100.00 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | id_cu | 4 | | 18011 | 100.00 | |
| 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 3 | UNION | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | ix_acquisitions_id_cu | 4 | | 20864 | 100.00 | |
| 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 4 | UNION | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | id_cu | 4 | | 31848 | 100.00 | |
| 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
8 rows in set, 1 warning (8.24 sec)
当前(进行编辑:今天检查)有39万次采集和920万个数据值(并且还在增长),大约需要10分钟来提取59列的表格.我知道,先例软件最多需要1个小时才能提取数据.
感谢您耐心阅读,直到这里:)
更新资料
在Denis回答之后,我尝试了他的更改1.和2.,这是新查询的结果:
SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
UNION ALL
SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
UNION ALL
SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
) AS T GROUP BY datetime
在这里新的扩展说明:
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 51997 | 100.00 | Using temporary; Using filesort |
| 2 | DERIVED | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 14827 | 100.00 | Using where |
| 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 3 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 18663 | 100.00 | Using where |
| 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 4 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 13260 | 100.00 | Using where |
| 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
8 rows in set, 1 warning (3.01 sec)
毫无疑问,在表演方面取得了不错的成绩
更新(2)
这是添加点3.
EXPLAIN EXTENDED SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acquisitions.datetime AS datetime, MAX(data.value) AS v1, NULL AS v2, NULL AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 1 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
UNION ALL
SELECT acquisitions.datetime AS datetime, NULL AS v1, MAX(data.value) AS v2, NULL AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 4 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
UNION ALL
SELECT acquisitions.datetime AS datetime, NULL AS v1, NULL AS v2, MAX(data.value) AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 8 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
) AS T GROUP BY datetime;
这是EXPLAIN EXTENDED的结果
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 51997 | 100.00 | Using temporary; Using filesort |
| 2 | DERIVED | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 14827 | 100.00 | Using where |
| 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 3 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 18663 | 100.00 | Using where |
| 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 4 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 13260 | 100.00 | Using where |
| 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
8 rows in set, 1 warning (3.06 sec)
只是慢一点,是否应该从大量的同乐中受益?我会试试看…
更新(3)
我尝试使用和不使用MAX(data.value)… GROUP BY datetime,并且在60列查询中获得更好的结果.结果因尝试而异,这是其中之一.
>原始查询9m12.144s
>与丹尼斯的1.和2.4m6.597s
>与Denis的1.,2和3. 4m0.210s
所需时间减少了约57%.
更新(4)
我尝试了Andiry解决方案,但是它比Denis优化要慢得多.
在3个组合/列上进行了测试:
>未优化:1立方米
> Denis优化:1.7秒
>安迪(Andiry)的案例:9.3秒
我还测试了12个组合/列:
>未优化:未经测试
> Denis优化:3.6秒
>安迪(Andiry)的案例:13.7秒
此外,安迪里(Andiry)的解决方案还可以获取获取日期,在该日期中,任何选定组合都没有数据,而其他组合则存在.
想象控制单元1每30分钟在:00和:30获取数据,而控制单元2在:15和:45:我将行数加倍,并用NULL填充空行.
注意:
一切都与传感器系统有关:有多个控制单元(每个id_cu一个),每个传感器都有许多传感器.
一个传感器由一对id_cu / id_meas标识,并针对每种度量发送不同的详细信息,例如MIN(id_elab = 1),MAX(id_elab = 2),AVERAGE(id_elab = 3),即时(id_elab = …)等,每个id_elab一个.
用户可以自由选择自己想要的详细信息,例如:
>结果列的控制单元#1的传感器#3的平均值(3),因此id_cu = 1 / id_meas = 3 / id_elab = 3
>结果列的控制单元#1的传感器#5的平均值(3),因此id_cu = 1 / id_meas = 5 / id_elab = 3
>另一列的控制单元#4的传感器#2的最小值(1),因此id_cu = 4 / id_meas = 2 / id_elab = 1
>(输入任何有效的id_cu,id_meas,id_elab组合)
> …
等等,多达数十种选择…
这是部分DDL(不相关的表):
CREATE TABLE acquisitions (
id INTEGER NOT NULL AUTO_INCREMENT,
id_cu INTEGER NOT NULL,
datetime DATETIME NOT NULL,
PRIMARY KEY (id),
UNIQUE (id_cu, datetime),
FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)
CREATE TABLE data (
id INTEGER NOT NULL AUTO_INCREMENT,
id_acq INTEGER NOT NULL,
id_meas INTEGER NOT NULL,
id_elab INTEGER NOT NULL,
value FLOAT,
PRIMARY KEY (id),
FOREIGN KEY(id_acq) REFERENCES acquisitions (id) ON DELETE CASCADE
)
CREATE TABLE ctrl_units (
id INTEGER NOT NULL,
name VARCHAR(40) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE sensors (
id_cu INTEGER NOT NULL,
id_meas INTEGER NOT NULL,
id_elab INTEGER NOT NULL,
name VARCHAR(40) NOT NULL,
`desc` VARCHAR(80),
PRIMARY KEY (id_cu, id_meas),
FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)
解决方法:
存在三个主要问题:
>使用全部工会,而不是工会.您正在分组和获取最小值/最大值,因此引入删除重复行的排序步骤毫无意义.
> where子句可以放在每个联合子语句中:
select ...
from (
select ... from ... where ...
union all
select ... from ... where ...
union all
...
)
group by ...
编写方式的开始是获取所有行,将所有行追加,最后过滤所需的行.在union子语句中注入where子句将使其仅获取所需的行,最后将所有行追加.
>按照相同的方式,预先汇总聚合:
select ..., max(foo) as foo
from (
select ..., max(foo) as foo from ... where ... group by ...
union all
select ..., max(foo) as foo from ... where ... group by ...
union all
...
)
group by ...
优化程序将更好地利用现有索引,最终您将仅追加几行,而不是几百万行.
标签:query-optimization,sql,mysql 来源: https://codeday.me/bug/20191102/1993019.html