首页 > 数据库> > 【TPC-DS】trino+S3+hive+postgresql性能测试----查询与写入数据（五）

【TPC-DS】trino+S3+hive+postgresql性能测试----查询与写入数据（五）

2021-11-23 16:03:20 作者：互联网

【查询】 TPC-DS的查询SQL存放路径：(为自己方便，将查询、写入脚本放在同一目录下)

cd /root/trino/tpcds-kit/TpcdsData/script/sample-queries

记得要把sample-queries文件上传到此处

方式一：在数据库中验证查询sql【验证SQL正确性】 1、进入postgresql数据库

[root@cluster-data-node-01 ~]# /root/trino/trino-server-363/trino --server 10.201.0.125:8080 --catalog postgresql --schema public

2、复制query1.sql中的SQL命令运行，执行结果如下示例：

方式二：命令行运行单条SQL，获取SQL查询时间

执行单条SQL，query1.sql是否成功，并记录query1.sql的查询时间，需要在query1.sql的SQL语句前后加上select now()
query1.sql的SQL内容示例

select now();
with customer_total_return as
(select sr_customer_sk as ctr_customer_sk
,sr_store_sk as ctr_store_sk
,sum(SR_FEE) as ctr_total_return
from store_returns
,date_dim
where sr_returned_date_sk = d_date_sk
and d_year =2000
group by sr_customer_sk
,sr_store_sk)
 select  c_customer_id
from customer_total_return ctr1
,store
,customer
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
from customer_total_return ctr2
where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
and s_store_sk = ctr1.ctr_store_sk
and s_state = 'NM'
and ctr1.ctr_customer_sk = c_customer_sk
order by c_customer_id
limit 100;
select now();

3、命令行运行，查看执行结果

/root/trino/trino-server-363/trino --server 10.201.0.125:8080 --catalog postgresql --schema public -f /root/trino/tpcds-kit/TpcdsData/script/sample-queries/query1.sql

【重要】执行到此处，可以看到每条查询的sql都是单独执行，并且不方便直观的去获取到查询的时间，要自己去计算每执行一条SQL执行的时间，比较耗时耗力，因此要考虑：如何快速全部一次执行所有的SQL？如何将每条SQL的查询时间获取到秒？每个时间可以根据不同的query标记？等等，如下的方式三解决这些问题 方式三：命令行运行所有查询SQL，获取查询时间 1、新建py文件，如：bath_query_time.py，文件内容如下：

#!/usr/bin/python
# --*-- coding: UTF-8 --*--
import time
import subprocess
import datetime
import os

def run(cmd):
    #sys.stdout.write("Running cmd: %s\n" % cmd)
    p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
    stdout, stderr = p.communicate()
    p.wait()
    if p.returncode != 0:
        print "Bad rc (%s) for cmd '%s': %s" % (p.returncode, cmd, stdout + stderr)
        return -1
    return 0

def run_one_presqlsql(sql):
    start_time=time.time()
    #run_presql_cmd="kubectl exec -it pod/trino-cli -- trino --server trino:8080 --catalog hive --schema default --execute \"%s\"" %(sql)
    run_presql_cmd="/root/trino/trino-server-363/trino --server 10.201.0.125:8080 --catalog postgresql --schema public -f /root/trino/tpcds-kit/TpcdsData/script/sample-queries/%s" %(sql)
    #run_presql_cmd="cat /opt/tpcds/new_tpcds_queries/tpcds-presto/%s | mysql -h 10.201.0.204 -u root -P 9030 -p123456 sf500_02 " %(sql)
    if run(run_presql_cmd)!=0:
        return -1
    else:
        return time.time()-start_time


def get_all_queries_time_use_file(queries_dirs):
    logfilename = "/tmp/tpcds_time.log"
    logfile=open(logfilename,'w+')
    for fstr in os.listdir(queries_dirs):
        #f=open(os.path.join(os.path.abspath(queries_dirs),fstr),'r')
        run_time= run_one_presqlsql(fstr)
        if run_time!=-1:
            print >> logfile,"query:%s,time:%.2f" %(fstr,float(run_time))

    logfile.flush()
    logfile.close()

def get_all_queries_time(queries_dirs):
    logfilename = "/tmp/tpcds_time.log"
    usedb="tpcds"
    logfile=open(logfilename,'w+')
    for fstr in os.listdir(queries_dirs):
        f=open(os.path.join(os.path.abspath(queries_dirs),fstr),'r')
        strbuf="use %s; \n" %(usedb)
        for s in f.readlines():
            strbuf+=s
        run_time= run_one_presqlsql(strbuf)
        if run_time!=-1:
            print >> logfile,"schema:%s,query:%s,time:%.2f" %(usedb,fstr,float(run_time))

    logfile.flush()
    logfile.close()

if __name__ == "__main__":
    get_all_queries_time_use_file("/root/trino/tpcds-kit/TpcdsData/script/sample-queries/")

【重要】注意文件修改内容：

run_presql_cmd：sql查询语句存放的目录
logfilename = "/tmp/tpcds_time.log" #执行log的存放目录，里面会记录查询的时间
get_all_queries_time_use_file：query的存放路径

2、运行py文件

python bath_query_time.py

nohup python bath_query_time.py #后台运行脚本

3、查看执行结果 查看链接：http://10.201.0.125:8080/ui/

【写入】与查询的操作方式一致，参考如上的方式三 常见文件，命名为bath_insert_time.py复制“查询的方式三中的脚本”，修改run_presql_cmd这行参数

示例：查看call_center.sql的内容

标签：postgresql,S3,queries,hive,--,sql,time,run,trino
来源： https://www.cnblogs.com/syw20170419/p/15593592.html