首页 > 数据库> > mysql之重复记录查询及去重

mysql之重复记录查询及去重

2021-06-08 12:02:16 作者：互联网

0. 前言

在实际的数据库设计过程中，很少通过索引组合键唯一索引来保证数据唯一性。很多是通过代码逻辑（先查询再插入）去保证唯一性，这个在并发很少的情况下是没有问题的，一般用在配置类的数据库表。不过在一些并发高的场景下，就会出现多线程问题，因为先查询再插入并非原子操作，另外当程序出错情况也会导致在数据库里存在重复记录。这个时候程序员就会被叫来背锅，首要的当然是处理掉已产生的重复数据保证业务正常运行，再排查问题原因。本文主要以mysql为例记录如何处理重复记录。

1. 查找重复记录

select * from 表 where 重复字段 in (select 重复字段 from 表 group by 
重复字段 having count(*)>1)

一个重复字段，以商家表中的qq重复字段处理为例：

select * from owner where qq in (select qq from owner group by qq
 having count(*) > 1)

多个重复字段，以商品的sku表中商品id、商品sku_code两个重复字段处理为例：

select * from sku where (product_id,sku_code) in (select product_id ,
 sku_code from sku group by product_id, sku_code having count(*) > 1)

2. 删除重复记录

网上很多版本：

delete from 表 where id not in (select min(id) from 表 group by 重复字段 having 
count(*)>1) and 重复字段 in （select 重复字段 from 表 group by 重复字段 having 
count(*)>1）

但是按这种处理，会发现在mysql中会报错“You can’t specify target table ‘pms_sku’ for update in FROM clause”意思就是“不能在同一个sql语句中，先select同一个表的某些值，然后再update这个表”
正常的做法应该是：

delete from  表 where id in (select id from (select  * from 表 where id not in (select min(id) from 表 group by 重复字段 having count(*)>1) and 重复字段 in （select 重复字段 from 表 group by 重复字段 having count(*)>1）
) 临时表）

也就是多嵌套一层select出来一个临时表
这里同样以删除sku 中以商品id、商品sku_code两个重复字段处理为例：

delete from sku WHERE id in(select id from (select * FROM sku WHERE 
(product_id,sku_code) IN (SELECT product_id,sku_code FROM sku GROUP BY 
product_id,sku_code HAVING COUNT(*)>1) and id not in (select min(id) from sku 
group by product_id,sku_code having count(*)>1)) a)

3. 最后

处理重复记录的方法，你学会了吗？当然如果业务明确，这些键是唯一的标识，可以建立唯一索引从数据库层面保证数据的唯一性。

标签：sku,code,group,重复,查询,mysql,重复记录,id,select
来源： https://blog.csdn.net/m0_37774696/article/details/117695298