编程语言
首页 > 编程语言> > Python使用pandas_profiling库生成报告

Python使用pandas_profiling库生成报告

作者:互联网

Python使用pandas_profiling库生成报告

命令行安装
pip install pandas_profiling
pip install pandas_profiling==2.10.1 --指定版本

清华镜像安装
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas_profiling

卸载pandas_profiling
pip uninstall pandas_profiling

安装pandas_profiling报错处理
报错:
ERROR: Cannot uninstall 'PyYAML'.  It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

错误:无法卸载“PyYAML”。 它是一个distutils安装的项目,因此我们不能准确地确定哪些文件属于它,这将导致只部分卸载。

解决办法:卸载以后,在重新安装就可以了

在线下载命令
pip install -i https://pypi.douban.com/simple  scrapy

常用的python 镜像
豆瓣,该网站比较稳定,速度也比较快
https://pypi.douban.com/simple

清华大学
https://pypi.tuna.tsinghua.edu.cn/simple

中国科技大学
https://mirrors.ustc.edu.cn/pypi/web/simple

阿里
https://mirrors.aliyun.com/pypi/simple/


import pandas as pd
import pandas_profiling
import os
import re

intput_dir = os.walk(r"../test_data")
output_dir = '../test_data'
hospitol = 'XX'

for path, dir_list, file_list in intput_dir:
    for file_name in file_list:
        if file_name == 'XX.csv': #跑单张表pandas_profiling时使用;
            file_path = os.path.join(path, file_name)
            df = pd.read_csv(file_path)
            # 获取表名
            tablename = re.compile(r'\w+')
            t_lst = re.findall(tablename, file_name)
            for l in t_lst:
                table_name = str.lower(l)
                #minimal=True 该参数,如果不设会出更详细的pandas_profiling报告;
                profile = pandas_profiling.ProfileReport(df, title=f'{hospitol}{table_name}表数据质量报告',minimal=True)
                profile.to_file(output_file=os.path.join(output_dir, table_name + '.html'))

Pandas Profiling

Pandas Profiling Logo Header

Documentation | Slack | Stack Overflow

Generates profile reports from a pandas DataFrame.

The pandas df.describe() function is great but a little basic for serious exploratory data analysis.
pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

Announcements

Version v2.10.0rc1 released

v2.10.0rc1 includes a major overhaul of the type system, now fully reliant on visions.
See the changelog below to know what has changed.

Spark backend in progress

We can happily announce that we’re nearing v1 for the Spark backend for generating profile reports.
Stay tuned.

Support pandas-profiling

The development of pandas-profiling relies completely on contributions.
If you find value in the package, we welcome you to support the project through GitHub Sponsors!
It’s extra exciting that GitHub matches your contribution for the first year.

Find more information here:

January 5, 2021

标签:profile,profiling,Python,file,report,data,pandas
来源: https://blog.csdn.net/qq_43278973/article/details/122718777