数据库
首页 > 数据库> > Python-MySQL抽取表中每个分组的前N个数据

Python-MySQL抽取表中每个分组的前N个数据

作者:互联网

在超大规模数据处理时,有时候需要按分组取出一部分数据,作为整体数据的参考。
例如,有表table_1,按column_1可以划分为200组,现在我们希望从这200组中每组各取出10条记录作为代表。
我们可以通过以下Python语句操作MySQL,来实现此功能:

import mysql.connector
conn = mysql.connector.connect(host = '127.0.0.1', port = 3306, user = 'root', password = 'xxx', db = 'xxx' )
cursor = conn.cursor()
for i in range(0,200): #注意limit i,j 时 i是从0开始索引的
    sql = "INSERT INTO Sample_table_1 SELECT t.column_1, t.column_2 FROM table_1 t, (SELECT * FROM table_1 group by column1 limit %s,1) p where t.column1=p.column1 limit 10;"%i
    cursor.execute(sql)
    conn.commit()
conn.close()

当然,由于这是个插入操作,我们也可以使用excutemany来实现:

import mysql.connector
conn = mysql.connector.connect(host = '127.0.0.1', port = 3306, user = 'root', password = 'xxx', db = 'xxx' )
cursor = conn.cursor()
cursor.executemany("INSERT INTO Sample_table_1 SELECT t.column_1, t.column_2 FROM table_1 t, (SELECT * FROM table_1 group by column1 limit %s,1) p where t.column1=p.column1 limit 10;", [(i,) for i in range(0,200)] ) #注意必须保留(i,)中的",",否则会报错
conn.commit()
conn.close()

再或者,也可以使用MySQL存储过程。

import mysql.connector
conn = mysql.connector.connect(host = '127.0.0.1', port = 3306, user = 'root', password = 'xxx', db = 'xxx' )
cursor = conn.cursor()
sql1="drop procedure if exists test;"
sql2="create procedure test() \
    begin\
    declare i bigint default 0; \
    DECLARE var BIGINT DEFAULT 0; \
    while i < 200 do \
    INSERT INTO Sample_table_1 SELECT t.column_1, t.column_2 FROM table_1 t, (SELECT * FROM table_1 group by column1 limit i,1) p where t.column1=p.column1 limit 10;\
    set i = i + 1;\
    end while;\
    end;"
sql3="call test;"
cursor.execute(sql1)
cursor.execute(sql2)
cursor.execute(sql3)
conn.commit()
conn.close()

就执行效率来说,excutemany较excute略高一些,存储过程的效率最低。

标签:Python,MySQL,cursor,column,limit,表中,table,column1,conn
来源: https://blog.csdn.net/zbgjhy88/article/details/89077340