【TPC-DS】trino+S3+hive+postgresql性能测试----查询与写入数据(五)
作者:互联网
【查询】
TPC-DS的查询SQL存放路径:(为自己方便,将查询、写入脚本放在同一目录下)
cd /root/trino/tpcds-kit/TpcdsData/script/sample-queries记得要把sample-queries文件上传到此处 方式一:在数据库中验证查询sql【验证SQL正确性】 1、进入postgresql数据库
[root@cluster-data-node-01 ~]# /root/trino/trino-server-363/trino --server 10.201.0.125:8080 --catalog postgresql --schema public2、复制query1.sql中的SQL命令运行,执行结果如下示例: 方式二:命令行运行单条SQL,获取SQL查询时间
- 执行单条SQL,query1.sql是否成功,并记录query1.sql的查询时间,需要在query1.sql的SQL语句前后加上select now()
- query1.sql的SQL内容示例
select now(); with customer_total_return as (select sr_customer_sk as ctr_customer_sk ,sr_store_sk as ctr_store_sk ,sum(SR_FEE) as ctr_total_return from store_returns ,date_dim where sr_returned_date_sk = d_date_sk and d_year =2000 group by sr_customer_sk ,sr_store_sk) select c_customer_id from customer_total_return ctr1 ,store ,customer where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 from customer_total_return ctr2 where ctr1.ctr_store_sk = ctr2.ctr_store_sk) and s_store_sk = ctr1.ctr_store_sk and s_state = 'NM' and ctr1.ctr_customer_sk = c_customer_sk order by c_customer_id limit 100; select now();3、命令行运行,查看执行结果
/root/trino/trino-server-363/trino --server 10.201.0.125:8080 --catalog postgresql --schema public -f /root/trino/tpcds-kit/TpcdsData/script/sample-queries/query1.sql【重要】执行到此处,可以看到每条查询的sql都是单独执行,并且不方便直观的去获取到查询的时间,要自己去计算每执行一条SQL执行的时间,比较耗时耗力,因此要考虑:如何快速全部一次执行所有的SQL?如何将每条SQL的查询时间获取到秒?每个时间可以根据不同的query标记?等等,如下的方式三解决这些问题 方式三:命令行运行所有查询SQL,获取查询时间 1、新建py文件,如:bath_query_time.py,文件内容如下:
#!/usr/bin/python # --*-- coding: UTF-8 --*-- import time import subprocess import datetime import os def run(cmd): #sys.stdout.write("Running cmd: %s\n" % cmd) p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) stdout, stderr = p.communicate() p.wait() if p.returncode != 0: print "Bad rc (%s) for cmd '%s': %s" % (p.returncode, cmd, stdout + stderr) return -1 return 0 def run_one_presqlsql(sql): start_time=time.time() #run_presql_cmd="kubectl exec -it pod/trino-cli -- trino --server trino:8080 --catalog hive --schema default --execute \"%s\"" %(sql) run_presql_cmd="/root/trino/trino-server-363/trino --server 10.201.0.125:8080 --catalog postgresql --schema public -f /root/trino/tpcds-kit/TpcdsData/script/sample-queries/%s" %(sql) #run_presql_cmd="cat /opt/tpcds/new_tpcds_queries/tpcds-presto/%s | mysql -h 10.201.0.204 -u root -P 9030 -p123456 sf500_02 " %(sql) if run(run_presql_cmd)!=0: return -1 else: return time.time()-start_time def get_all_queries_time_use_file(queries_dirs): logfilename = "/tmp/tpcds_time.log" logfile=open(logfilename,'w+') for fstr in os.listdir(queries_dirs): #f=open(os.path.join(os.path.abspath(queries_dirs),fstr),'r') run_time= run_one_presqlsql(fstr) if run_time!=-1: print >> logfile,"query:%s,time:%.2f" %(fstr,float(run_time)) logfile.flush() logfile.close() def get_all_queries_time(queries_dirs): logfilename = "/tmp/tpcds_time.log" usedb="tpcds" logfile=open(logfilename,'w+') for fstr in os.listdir(queries_dirs): f=open(os.path.join(os.path.abspath(queries_dirs),fstr),'r') strbuf="use %s; \n" %(usedb) for s in f.readlines(): strbuf+=s run_time= run_one_presqlsql(strbuf) if run_time!=-1: print >> logfile,"schema:%s,query:%s,time:%.2f" %(usedb,fstr,float(run_time)) logfile.flush() logfile.close() if __name__ == "__main__": get_all_queries_time_use_file("/root/trino/tpcds-kit/TpcdsData/script/sample-queries/")
【重要】注意文件修改内容:
- run_presql_cmd:sql查询语句存放的目录
- logfilename = "/tmp/tpcds_time.log" #执行log的存放目录,里面会记录查询的时间
- get_all_queries_time_use_file:query的存放路径
python bath_query_time.py
nohup python bath_query_time.py #后台运行脚本3、查看执行结果 查看链接:http://10.201.0.125:8080/ui/
【写入】与查询的操作方式一致,参考如上的方式三 常见文件,命名为bath_insert_time.py复制“查询的方式三中的脚本”,修改run_presql_cmd这行参数
示例:查看call_center.sql的内容
标签:postgresql,S3,queries,hive,--,sql,time,run,trino 来源: https://www.cnblogs.com/syw20170419/p/15593592.html