pandas
作者:互联网
\(matplotlib\) 全是图,懒得用.md格式记笔记了
pandas
\(pandas\) 库中包含两个重要的数据类型: \(Series\) 和 \(DataFrame\),前者是一维数据类型,后者是多维数据类型。
\(Series\) 数据类型
索引+一维数据
创建
没有指定索引时,索引值从0开始
>>> a=[1,2,3]
>>> m=pd.Series(a)
>>> m
0 1
1 2
2 3
dtype: int64
#左侧为索引,右侧为数据
指定索引
>>> a=["Google","Runoob","Wiki"]
>>> m=pd.Series(a,index=[1,2,3])
>>> m
1 Google
2 Runoob
3 Wiki
dtype: object
使用时就像\(c\)++里面的 \(map\) 一样
>>> m[1]
'Google'
也可以使用键值对字典创立 \(Series\)
>>> mp={1:"Google",2:"Runoob",3:"Wiki"}
>>> m=pd.Series(mp)
>>> m
1 Google
2 Runoob
3 Wiki
dtype: object
还可以给 \(Series\) 命名
>>> a=["Google","Runoob","Wiki"]
>>> m=pd.Series(a,index=[1,2,3],name="misasteria")
>>> m.index.name="me"
>>> m
me
1 Google
2 Runoob
3 Wiki
Name: misasteria, dtype: object
\(DataFrame\)数据类型
创建
\(pandas.DataFrame( data, index, columns, dtype, copy)\)
用列表创建
>>> data=[["Google",10],["Runoob",12],["Wiki",13]]
>>> df=pd.DataFrame(data,index=[1,2,3],columns=["site","age"])
>>> df
site age
1 Google 10
2 Runoob 12
3 Wiki 13
用 \(numpy.ndarray\) 创建
>>> import numpy as np
>>> df=pd.DataFrame(np.arange(10).reshape(2,5))
>>> df
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
用字典创建
>>> data=[{'a':1,'b':2},{'a':5,'b':10,'c':20}]
>>> df=pd.DataFrame(data)
>>> df
a b c
0 1 2 NaN
1 5 10 20.0
#没有数据为NaN
\(pandas\) 处理 csv 文件
csv转DataFrame
>>> df=pd.read_csv("D:\\nba.csv")
>>> print(df.to_string) #转化为DataFrame格式
<bound method DataFrame.to_string of Name Team Number Position Age Height Weight College Salary
0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0
1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0
2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN
3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0
4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 NaN 5000000.0
.. ... ... ... ... ... ... ... ... ...
453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0 Butler 2433333.0
454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0 NaN 900000.0
455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0 NaN 2900000.0
456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0 Kansas 947276.0
457 NaN NaN NaN NaN NaN NaN NaN NaN NaN
[458 rows x 9 columns]>
DataFrame转csv
>>> df=pd.DataFrame(np.arange(10).reshape(2,5))
>>> df.to_csv("D:\\site.csv")
#不存在文件时会自动创建
数据处理
\(DataFrame.head(n)\) 读取前n行,缺省时为5
\(DataFrame.tail(n)\) 读取后n行,缺省时为5
\(DataFrame.info()\) 输出一些基本信息
>>> df=pd.read_csv("D:\\nba.csv")
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457 #行数
Data columns (total 9 columns): #列数
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 457 non-null object
1 Team 457 non-null object
2 Number 457 non-null float64
3 Position 457 non-null object
4 Age 457 non-null float64
5 Height 457 non-null object
6 Weight 457 non-null float64
7 College 373 non-null object
8 Salary 446 non-null float64
dtypes: float64(4), object(5)
#non-null 非空的数据
数据运算
算数运算
自动补齐,缺项为NaN
>>> a=pd.DataFrame(np.arange(12).reshape(3,4))
>>> a
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
>>> b=pd.DataFrame(np.arange(20).reshape(4,5))
>>> b
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
>>> a + b
0 1 2 3 4
0 0.0 2.0 4.0 6.0 NaN
1 9.0 11.0 13.0 15.0 NaN
2 18.0 20.0 22.0 24.0 NaN
3 NaN NaN NaN NaN NaN
>>> a * b
0 1 2 3 4
0 0.0 1.0 4.0 9.0 NaN
1 20.0 30.0 42.0 56.0 NaN
2 80.0 99.0 120.0 143.0 NaN
3 NaN NaN NaN NaN NaN
可以用\(fill\_value\)规定缺少的项的数值
>>> b.add(a,fill_value=100)
0 1 2 3 4
0 0.0 2.0 4.0 6.0 104.0
1 9.0 11.0 13.0 15.0 109.0
2 18.0 20.0 22.0 24.0 114.0
3 115.0 116.0 117.0 118.0 119.0
>>> a.add(b,fill_value=100)
0 1 2 3 4
0 0.0 2.0 4.0 6.0 104.0
1 9.0 11.0 13.0 15.0 109.0
2 18.0 20.0 22.0 24.0 114.0
3 115.0 116.0 117.0 118.0 119.0
只有不同维度之间会进行广播运算
>>> b=pd.DataFrame(np.arange(3))
>>> a+b
0 1 2 3
0 0 NaN NaN NaN
1 5 NaN NaN NaN
2 10 NaN NaN NaN
>>> b-10
0
0 -10
1 -9
2 -8
一维 \(Series\) 默认在1轴进行运算,可以强制使其在0轴运算
>>> b=pd.DataFrame(np.arange(20).reshape(4,5))
>>> c=pd.Series(np.arange(4))
>>> b
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
>>> c
0 0
1 1
2 2
3 3
>>> b.sub(c)
0 1 2 3 4
0 0.0 0.0 0.0 0.0 NaN
1 5.0 5.0 5.0 5.0 NaN
2 10.0 10.0 10.0 10.0 NaN
3 15.0 15.0 15.0 15.0 NaN
>>> b.sub(c,axis=0)
0 1 2 3 4
0 0 1 2 3 4
1 4 5 6 7 8
2 8 9 10 11 12
3 12 13 14 15 16
标签:10,non,df,NaN,DataFrame,pd,pandas 来源: https://www.cnblogs.com/misasteria/p/16596439.html