其他分享
首页 > 其他分享> > 数据分析day04

数据分析day04

作者:互联网

数据分析day04

7.pandas高级操作

In [53]:

import pandas as pd
from pandas import DataFrame
import numpy as np

替换操作

In [4]:

df = DataFrame(data=np.random.randint(0,100,size=(6,7)))
df

Out[4]:

0 1 2 3 4 5 6
0 44 62 3 85 26 47 14
1 15 78 32 98 85 4 51
2 53 75 87 21 45 8 18
3 54 31 67 49 77 25 49
4 18 21 18 31 93 11 0
5 21 54 76 95 70 77 49

In [5]:

df.replace(to_replace=3,value='Three')

Out[5]:

0 1 2 3 4 5 6
0 44 62 Three 85 26 47 14
1 15 78 32 98 85 4 51
2 53 75 87 21 45 8 18
3 54 31 67 49 77 25 49
4 18 21 18 31 93 11 0
5 21 54 76 95 70 77 49

In [6]:

df.replace(to_replace={3:'aaa'})

Out[6]:

0 1 2 3 4 5 6
0 44 62 aaa 85 26 47 14
1 15 78 32 98 85 4 51
2 53 75 87 21 45 8 18
3 54 31 67 49 77 25 49
4 18 21 18 31 93 11 0
5 21 54 76 95 70 77 49

In [8]:

#替换指定列中的值
df.replace(to_replace={5:77},value='6666666')

Out[8]:

0 1 2 3 4 5 6
0 44 62 3 85 26 47 14
1 15 78 32 98 85 4 51
2 53 75 87 21 45 8 18
3 54 31 67 49 77 25 49
4 18 21 18 31 93 11 0
5 21 54 76 95 70 6666666 49

映射操作

In [10]:

dic = {
    'name':['jay','tom','jay'],
    'salary':[10000,20000,10000]
}
df = DataFrame(data=dic)
df

Out[10]:

name salary
0 jay 10000
1 tom 20000
2 jay 10000

In [14]:

#映射关系表
dic = {
    'jay':'张三',
    'tom':'李四'
}
df['c_name'] = df['name'].map(dic)
df

Out[14]:

name salary c_name
0 jay 10000 张三
1 tom 20000 李四
2 jay 10000 张三

运算工具

In [16]:

def after_sal(s):
    return s - (s-3000)*0.5

In [18]:

df['after_salary'] = df['salary'].map(after_sal)
df

Out[18]:

name salary c_name after_salary
0 jay 10000 张三 6500.0
1 tom 20000 李四 11500.0
2 jay 10000 张三 6500.0

映射索引

In [19]:

df4 = DataFrame({'color':['white','gray','purple','blue','green'],'value':np.random.randint(10,size = 5)})
df4

Out[19]:

color value
0 white 2
1 gray 5
2 purple 9
3 blue 0
4 green 1

In [20]:

new_index = {0:'first',1:'two',2:'three',3:'four',4:'five'}
new_col={'color':'cc','value':'vv'}
df4.rename(new_index,columns=new_col)

Out[20]:

cc vv
first white 2
two gray 5
three purple 9
four blue 0
five green 1

排序实现的随机抽样

In [22]:

df = DataFrame(data=np.random.randint(0,100,size=(100,3)),columns=['A','B','C'])
df

In [24]:

# df.take(['B','A','C'],axis=1)
df.take([1,0,2],axis=1)

In [32]:

np.random.permutation(3) #返回随机序列

Out[32]:

array([0, 1, 2])

In [31]:

#将行列索引打乱
df.take(np.random.permutation(100),axis=0).take(np.random.permutation(3),axis=1)

In [35]:

df.take(np.random.permutation(100),axis=0).take(np.random.permutation(3),axis=1)[0:50]

数据的分类处理

In [36]:

df = DataFrame({'item':['Apple','Banana','Orange','Banana','Orange','Apple'],
                'price':[4,3,3,2.5,4,2],
               'color':['red','yellow','yellow','green','green','green'],
               'weight':[12,20,50,30,20,44]})
df

Out[36]:

color item price weight
0 red Apple 4.0 12
1 yellow Banana 3.0 20
2 yellow Orange 3.0 50
3 green Banana 2.5 30
4 green Orange 4.0 20
5 green Apple 2.0 44

In [37]:

#根据水果的种类进行分组
df.groupby(by='item')

Out[37]:

<pandas.core.groupby.DataFrameGroupBy object at 0x0000019782507F60>

In [38]:

#调用groups查看分组情况
df.groupby(by='item').groups

Out[38]:

{'Apple': Int64Index([0, 5], dtype='int64'),
 'Banana': Int64Index([1, 3], dtype='int64'),
 'Orange': Int64Index([2, 4], dtype='int64')}

In [40]:

#计算出每一种水果的平均价格
df.groupby(by='item').mean()['price']

Out[40]:

item
Apple     3.00
Banana    2.75
Orange    3.50
Name: price, dtype: float64

In [41]:

df.groupby(by='item')['price'].mean() #推荐

Out[41]:

item
Apple     3.00
Banana    2.75
Orange    3.50
Name: price, dtype: float64

In [42]:

#计算不同颜色水果的平均重量
df.groupby(by='color')['weight'].mean()

Out[42]:

color
green     31.333333
red       12.000000
yellow    35.000000
Name: weight, dtype: float64

In [44]:

#将每一种水果的平均价格计算出来且汇总到原数据中
df

Out[44]:

color item price weight
0 red Apple 4.0 12
1 yellow Banana 3.0 20
2 yellow Orange 3.0 50
3 green Banana 2.5 30
4 green Orange 4.0 20
5 green Apple 2.0 44

In [47]:

series_price = df.groupby(by='item')['price'].mean() 
dic = series_price.to_dict()
dic #映射关系表

Out[47]:

{'Apple': 3.0, 'Banana': 2.75, 'Orange': 3.5}

In [49]:

df['mean_price'] = df['item'].map(dic)
df

Out[49]:

color item price weight mean_price
0 red Apple 4.0 12 3.00
1 yellow Banana 3.0 20 2.75
2 yellow Orange 3.0 50 3.50
3 green Banana 2.5 30 2.75
4 green Orange 4.0 20 3.50
5 green Apple 2.0 44 3.00

高级数据聚合

In [56]:

def myMean(s):
    sum = 0
    for i in s:
        sum += i
    return sum/len(s)

In [57]:

df.groupby(by='item')['price'].apply(myMean) #apply充当聚合的运算工具

Out[57]:

item
Apple     3.00
Banana    2.75
Orange    3.50
Name: price, dtype: float64

In [58]:

df.groupby(by='item')['price'].transform(myMean) #apply充当聚合的运算工具

Out[58]:

0    3.00
1    2.75
2    3.50
3    2.75
4    3.50
5    3.00
Name: price, dtype: float64

数据加载

In [50]:

data_1 = pd.read_csv('./data/type-.txt',sep='-',header=None)

In [ ]:

In [46]:

#连接数据库,获取连接对象
import sqlite3 as sqlite3
conn=sqlite3.connect('./data/weather_2012.sqlite')

In [47]:

#读取库表中的数据值
sql_df=pd.read_sql('select * from weather_2012',conn)
sql_df

In [51]:

#将一个df中的数据值写入存储到db
data_1.to_sql('sql_data123',conn)

In [52]:

pd.read_sql('select * from sql_data123',conn)

Out[52]:

index 0 1 2
0 0 你好 我好 他也好
1 1 也许 大概 有可能
2 2 然而 未必 不见得

透视表

In [6]:

import pandas as pd
import numpy as np

In [15]:

df = pd.read_csv('./data/games.csv',encoding='utf-8')
df.head()

Out[15]:

对手 胜负 主客场 命中 投篮数 投篮命中率 3分命中率 篮板 助攻 得分
0 勇士 10 23 0.435 0.444 6 11 27
1 国王 8 21 0.381 0.286 3 9 27
2 小牛 10 19 0.526 0.462 3 7 29
3 灰熊 8 20 0.400 0.250 5 8 22
4 76人 10 20 0.500 0.250 3 13 27

pivot_table有四个最重要的参数index、values、columns、aggfunc

In [16]:

df.pivot_table(index='对手')

In [17]:

df.pivot_table(index=['对手','主客场'])

In [19]:

df.pivot_table(index=['主客场','胜负'],values=['得分','篮板','助攻'])

Out[19]:

助攻 得分 篮板
主客场 胜负
10.555556 34.222222 5.444444
8.666667 29.666667 5.000000
9.000000 32.000000 4.916667
8.000000 20.000000 4.000000

In [23]:

df.pivot_table(index=['主客场','胜负'],values=['得分','篮板','助攻'],aggfunc='sum')

Out[23]:

助攻 得分 篮板
主客场 胜负
95 308 49
26 89 15
108 384 59
8 20 4

In [24]:

#还想获得james harden在主客场和不同胜负情况下的总得分、平均篮板、最大助攻时
df.pivot_table(index=['主客场','胜负'],aggfunc={'得分':'sum','篮板':'mean','助攻':'max'})

Out[24]:

助攻 得分 篮板
主客场 胜负
17 308 5.444444
11 89 5.000000
15 384 4.916667
8 20 4.000000

In [35]:

df.pivot_table(index='主客场',values='得分',aggfunc='sum',columns='对手')

Out[35]:

对手 76人 勇士 国王 太阳 小牛 尼克斯 开拓者 掘金 步行者 湖人 灰熊 爵士 猛龙 篮网 老鹰 骑士 鹈鹕 黄蜂
主客场
29.0 NaN NaN NaN 29.0 37.0 NaN 21.0 29.0 NaN 60.0 56.0 38.0 37.0 NaN 35.0 26.0 NaN
27.0 27.0 27.0 48.0 NaN 31.0 48.0 NaN 26.0 36.0 49.0 29.0 NaN NaN 29.0 NaN NaN 27.0

交叉表

In [36]:

df = DataFrame({'sex':['man','man','women','women','man','women','man','women','women'],
               'age':[15,23,25,17,35,57,24,31,22],
               'smoke':[True,False,False,True,True,False,False,True,False],
               'height':[168,179,181,166,173,178,188,190,160]})
df

Out[36]:

age height sex smoke
0 15 168 man True
1 23 179 man False
2 25 181 women False
3 17 166 women True
4 35 173 man True
5 57 178 women False
6 24 188 man False
7 31 190 women True
8 22 160 women False

In [37]:

pd.crosstab(df.smoke,df.sex)

Out[37]:

sex man women
smoke
False 2 3
True 2 2

In [38]:

pd.crosstab(df.sex,df.smoke)

Out[38]:

smoke False True
sex
man 2 2
women 3 2

In [41]:

pd.crosstab(df.age,df.smoke)

Out[41]:

smoke False True
age
15 0 1
17 0 1
22 1 0
23 1 0
24 1 0
25 1 0
31 0 1
35 0 1
57 1 0

8. 2012美国大选献金项目数据分析

In [51]:

import pandas as pd
from pandas import DataFrame
import numpy as np

In [52]:

#方便大家操作,将月份和参选人以及所在政党进行定义:
months = {'JAN' : 1, 'FEB' : 2, 'MAR' : 3, 'APR' : 4, 'MAY' : 5, 'JUN' : 6,
          'JUL' : 7, 'AUG' : 8, 'SEP' : 9, 'OCT': 10, 'NOV': 11, 'DEC' : 12}
parties = {
  'Bachmann, Michelle': 'Republican',
  'Romney, Mitt': 'Republican',
  'Obama, Barack': 'Democrat',
  "Roemer, Charles E. 'Buddy' III": 'Reform',
  'Pawlenty, Timothy': 'Republican',
  'Johnson, Gary Earl': 'Libertarian',
  'Paul, Ron': 'Republican',
  'Santorum, Rick': 'Republican',
  'Cain, Herman': 'Republican',
  'Gingrich, Newt': 'Republican',
  'McCotter, Thaddeus G': 'Republican',
  'Huntsman, Jon': 'Republican',
  'Perry, Rick': 'Republican'           
 }

需求

In [53]:

#加载数据,查看数据的基本信息
df = pd.read_csv('./data/usa_election.txt')
df.head()
C:\Users\laonanhai\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2728: DtypeWarning: Columns (6) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

Out[53]:

cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num
0 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 250.0 20-JUN-11 NaN NaN NaN SA17A 736166
1 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 50.0 23-JUN-11 NaN NaN NaN SA17A 736166
2 C00410118 P20002978 Bachmann, Michelle SMITH, LANIER LANETT AL 3.68633e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 05-JUL-11 NaN NaN NaN SA17A 749073
3 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 NONE RETIRED 250.0 01-AUG-11 NaN NaN NaN SA17A 749073
4 C00410118 P20002978 Bachmann, Michelle WARDENBURG, HAROLD HOT SPRINGS NATION AR 7.19016e+08 NONE RETIRED 300.0 20-JUN-11 NaN NaN NaN SA17A 736166

In [54]:

#查看原始数据中是否存在缺失数据
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 536041 entries, 0 to 536040
Data columns (total 16 columns):
cmte_id              536041 non-null object
cand_id              536041 non-null object
cand_nm              536041 non-null object
contbr_nm            536041 non-null object
contbr_city          536026 non-null object
contbr_st            536040 non-null object
contbr_zip           535973 non-null object
contbr_employer      525088 non-null object
contbr_occupation    530520 non-null object
contb_receipt_amt    536041 non-null float64
contb_receipt_dt     536041 non-null object
receipt_desc         8479 non-null object
memo_cd              49718 non-null object
memo_text            52740 non-null object
form_tp              536041 non-null object
file_num             536041 non-null int64
dtypes: float64(1), int64(1), object(14)
memory usage: 65.4+ MB

In [55]:

df.describe()

Out[55]:

contb_receipt_amt file_num
count 5.360410e+05 536041.000000
mean 3.750373e+02 761472.107800
std 3.564436e+03 5148.893508
min -3.080000e+04 723511.000000
25% 5.000000e+01 756218.000000
50% 1.000000e+02 763233.000000
75% 2.500000e+02 763621.000000
max 1.944042e+06 767394.000000

In [56]:

#空值处理。可能因为忘记填写或者保密等等原因,相关字段出现了空值,将其填充为NOT PROVIDE
df.fillna(value='NOT PROVIDE',inplace=True)

In [57]:

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 536041 entries, 0 to 536040
Data columns (total 16 columns):
cmte_id              536041 non-null object
cand_id              536041 non-null object
cand_nm              536041 non-null object
contbr_nm            536041 non-null object
contbr_city          536041 non-null object
contbr_st            536041 non-null object
contbr_zip           536041 non-null object
contbr_employer      536041 non-null object
contbr_occupation    536041 non-null object
contb_receipt_amt    536041 non-null float64
contb_receipt_dt     536041 non-null object
receipt_desc         536041 non-null object
memo_cd              536041 non-null object
memo_text            536041 non-null object
form_tp              536041 non-null object
file_num             536041 non-null int64
dtypes: float64(1), int64(1), object(14)
memory usage: 65.4+ MB

In [58]:

#异常值处理。将捐款金额<=0的数据删除
df = df.loc[~(df['contb_receipt_amt'] <= 0)]

In [59]:

#查看当前有多少人参与了竞选
df['cand_nm'].unique()

Out[59]:

array(['Bachmann, Michelle', 'Romney, Mitt', 'Obama, Barack',
       "Roemer, Charles E. 'Buddy' III", 'Pawlenty, Timothy',
       'Johnson, Gary Earl', 'Paul, Ron', 'Santorum, Rick',
       'Cain, Herman', 'Gingrich, Newt', 'McCotter, Thaddeus G',
       'Huntsman, Jon', 'Perry, Rick'], dtype=object)

In [60]:

df['cand_nm'].nunique()

Out[60]:

13

In [61]:

#新建一列为各个候选人所在党派party
df['party'] = df['cand_nm'].map(parties)
df.head()

Out[61]:

cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num party
0 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 250.0 20-JUN-11 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican
1 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 50.0 23-JUN-11 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican
2 C00410118 P20002978 Bachmann, Michelle SMITH, LANIER LANETT AL 3.68633e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 05-JUL-11 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 749073 Republican
3 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 NONE RETIRED 250.0 01-AUG-11 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 749073 Republican
4 C00410118 P20002978 Bachmann, Michelle WARDENBURG, HAROLD HOT SPRINGS NATION AR 7.19016e+08 NONE RETIRED 300.0 20-JUN-11 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican

In [62]:

#查看party这一列中有哪些不同的元素,统计party列中各个元素出现次数
df['party'].unique()

Out[62]:

array(['Republican', 'Democrat', 'Reform', 'Libertarian'], dtype=object)

In [63]:

df['party'].value_counts()#value_counts()统计Serise中每一元素出现的次数

Out[63]:

Democrat       289999
Republican     234300
Reform           5313
Libertarian       702
Name: party, dtype: int64

In [64]:

#查看各个党派收到的政治献金总数contb_receipt_amt
df.groupby(by='party',axis=0)['contb_receipt_amt'].sum()

Out[64]:

party
Democrat       8.259441e+07
Libertarian    4.132769e+05
Reform         3.429658e+05
Republican     1.251181e+08
Name: contb_receipt_amt, dtype: float64

In [65]:

#查看具体每天各个党派收到的政治献金总数contb_receipt_amt
df.groupby(by=['contb_receipt_dt','party'])['contb_receipt_amt'].sum()

Out[65]:

contb_receipt_dt  party      
01-APR-11         Reform              50.00
                  Republican       12635.00
01-AUG-11         Democrat        182198.00
                  Libertarian       1000.00
                  Reform            1847.00
                  Republican      268903.02
01-DEC-11         Democrat        651982.82
                  Libertarian        725.00
                  Reform             875.00
                  Republican      505255.96
01-FEB-11         Republican         250.00
01-JAN-11         Republican        8600.00
01-JAN-12         Democrat         74303.80
                  Reform             515.00
                  Republican       76804.72
01-JUL-11         Democrat        175364.00
                  Libertarian       2000.00
                  Reform             100.00
                  Republican      125973.72
01-JUN-11         Democrat        148409.00
                  Libertarian        500.00
                  Reform              50.00
                  Republican      435609.20
01-MAR-11         Republican        1000.00
01-MAY-11         Democrat         82644.00
                  Reform             480.00
                  Republican       28663.87
01-NOV-11         Democrat        129309.87
                  Libertarian       3000.00
                  Reform            1792.00
                                    ...    
30-OCT-11         Reform            3910.00
                  Republican       46413.16
30-SEP-11         Democrat       3409587.24
                  Libertarian        550.00
                  Reform            2050.00
                  Republican     5094824.20
31-AUG-11         Democrat        375487.44
                  Libertarian      10750.00
                  Reform             450.00
                  Republican     1038330.90
31-DEC-11         Democrat       3571793.57
                  Reform             695.00
                  Republican     1165777.72
31-JAN-11         Republican        6000.00
31-JAN-12         Democrat       1421887.31
                  Reform             150.00
                  Republican      963681.41
31-JUL-11         Democrat         20305.00
                  Reform            1066.00
                  Republican       12781.02
31-MAR-11         Reform             200.00
                  Republican       74575.00
31-MAY-11         Democrat        352005.66
                  Libertarian        250.00
                  Reform             100.00
                  Republican      313839.80
31-OCT-11         Democrat        216971.87
                  Libertarian       4250.00
                  Reform            3205.00
                  Republican      751542.36
Name: contb_receipt_amt, Length: 1183, dtype: float64

In [66]:

df.columns

Out[66]:

Index(['cmte_id', 'cand_id', 'cand_nm', 'contbr_nm', 'contbr_city',
       'contbr_st', 'contbr_zip', 'contbr_employer', 'contbr_occupation',
       'contb_receipt_amt', 'contb_receipt_dt', 'receipt_desc', 'memo_cd',
       'memo_text', 'form_tp', 'file_num', 'party'],
      dtype='object')

In [67]:

#将表中日期格式转换为'yyyy-mm-dd'。
def tranformDate(d):
    day,month,year = d.split('-')
    month = months[month]
    return '20'+year+'-'+str(month)+'-'+day
df['contb_receipt_dt'] = df['contb_receipt_dt'].map(tranformDate)
df.head()

Out[67]:

cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num party
0 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 250.0 2011-6-20 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican
1 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 50.0 2011-6-23 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican
2 C00410118 P20002978 Bachmann, Michelle SMITH, LANIER LANETT AL 3.68633e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 2011-7-05 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 749073 Republican
3 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 NONE RETIRED 250.0 2011-8-01 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 749073 Republican
4 C00410118 P20002978 Bachmann, Michelle WARDENBURG, HAROLD HOT SPRINGS NATION AR 7.19016e+08 NONE RETIRED 300.0 2011-6-20 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican

In [68]:

#查看老兵(捐献者职业)DISABLED VETERAN主要支持谁(捐钱数量)

#1.将老兵对应的行数据取出
old_bing_df = df.loc[df['contbr_occupation'] == 'DISABLED VETERAN']

#2.对候选人分组钱数的聚合
old_bing_df.groupby(by='cand_nm')['contb_receipt_amt'].sum()

Out[68]:

cand_nm
Cain, Herman       300.00
Obama, Barack     4205.00
Paul, Ron         2425.49
Santorum, Rick     250.00
Name: contb_receipt_amt, dtype: float64

In [69]:

#找出各个候选人的捐赠者中,捐赠金额最大的人的职业以及捐献额
s = df.groupby(by=['cand_nm'])['contb_receipt_amt'].max()
s

Out[69]:

cand_nm
Bachmann, Michelle                   3022.00
Cain, Herman                        10000.00
Gingrich, Newt                       5100.00
Huntsman, Jon                        5000.00
Johnson, Gary Earl                   2500.00
McCotter, Thaddeus G                 4000.00
Obama, Barack                     1944042.43
Paul, Ron                            5000.00
Pawlenty, Timothy                   10000.00
Perry, Rick                         10000.00
Roemer, Charles E. 'Buddy' III        200.00
Romney, Mitt                        12700.00
Santorum, Rick                       5000.00
Name: contb_receipt_amt, dtype: float64

In [70]:

s.index[0]

Out[70]:

'Bachmann, Michelle'

In [71]:

for i in range(len(s)):
    q_str = 'cand_nm == "%s" & contb_receipt_amt==%d'%(s.index[i],s.values[i])
    display(df.query(q_str))

9.matplotlib绘图

plt.plot()绘制线性图

In [6]:

import numpy as np

In [4]:

import matplotlib.pyplot as plt
%matplotlib inline #保证绘制的图像可以被正常的显示加载出来
UsageError: unrecognized arguments: #保证绘制的图像可以被正常的显示加载出来

In [5]:

x = [1,2,3,4,5]
y = [5,4,3,2,1]

plt.plot(x,y)

Out[5]:

[<matplotlib.lines.Line2D at 0x1a72b583cf8>]

img

In [8]:

#在一个坐标系中绘制两条线段
xx = np.linspace(-np.pi,np.pi,num=20)
yy = xx ** 2
plt.plot(x,y)  #plot多次被调用,绘制多条线段
plt.plot(xx,yy)

Out[8]:

[<matplotlib.lines.Line2D at 0x1a72b870ef0>]

img

In [15]:

#将多个坐标放置在一个表格中
ax1 = plt.subplot(2,2,1) #表格大小和坐标存放的位置
ax1.plot(x,y)


ax2 = plt.subplot(2,2,2)
ax2.plot(xx,yy)


ax3 = plt.subplot(2,2,3)
ax3.plot(xx,yy)


ax4 = plt.subplot(2,2,4)
ax4.plot(x,y)

Out[15]:

[<matplotlib.lines.Line2D at 0x1a72bd5fb70>]

img

In [17]:

#plt.figure(figsize=(a,b))
plt.figure(figsize=(4,8))
plt.plot(x,y)

In [23]:

#图例的设定
plt.plot(xx,yy,label='aaa')
plt.plot(xx-1,yy+1,label='bbb')
plt.legend(loc=1)

Out[23]:

<matplotlib.legend.Legend at 0x1a72d8d4198>

img

In [25]:

#给坐标轴设定标识
plt.plot(xx-1,yy+1,label='bbb')
plt.xlabel('distence')
plt.ylabel('temp')
plt.title('aaa')

Out[25]:

Text(0.5,1,'aaa')

img

柱状图:plt.bar()

In [33]:

plt.bar(x,y)

Out[33]:

<Container object of 5 artists>

img

In [29]:

plt.barh(x,y)

Out[29]:

<Container object of 5 artists>

img

直方图

In [40]:

x = [1,1,2,3,4,5,5,5,6,7,7,7,7,7,7,7,8]
plt.hist(x,bins=20)

Out[40]:

(array([2., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 3., 0., 0., 1., 0., 0.,
        7., 0., 1.]),
 array([1.  , 1.35, 1.7 , 2.05, 2.4 , 2.75, 3.1 , 3.45, 3.8 , 4.15, 4.5 ,
        4.85, 5.2 , 5.55, 5.9 , 6.25, 6.6 , 6.95, 7.3 , 7.65, 8.  ]),
 <a list of 20 Patch objects>)

img

饼图

In [41]:

arr=[11,22,31,15]
plt.pie(arr)

Out[41]:

([<matplotlib.patches.Wedge at 0x1a72dae13c8>,
  <matplotlib.patches.Wedge at 0x1a72dae1748>,
  <matplotlib.patches.Wedge at 0x1a72c37ea58>,
  <matplotlib.patches.Wedge at 0x1a72bc76ac8>],
 [Text(0.996424,0.465981,''),
  Text(-0.195798,1.08243,''),
  Text(-0.830021,-0.721848,''),
  Text(0.910034,-0.61793,'')])

img

In [42]:

arr=[0.2,0.3,0.1]
plt.pie(arr)

Out[42]:

([<matplotlib.patches.Wedge at 0x1a72ef21630>,
  <matplotlib.patches.Wedge at 0x1a72ef21b00>,
  <matplotlib.patches.Wedge at 0x1a72ef28080>],
 [Text(0.889919,0.646564,''),
  Text(-0.646564,0.889919,''),
  Text(-1.04616,-0.339919,'')])

img

In [43]:

arr=[11,22,31,15]
plt.pie(arr,labels=['a','b','c','d'])

Out[43]:

([<matplotlib.patches.Wedge at 0x1a72ef61d68>,
  <matplotlib.patches.Wedge at 0x1a72ef69278>,
  <matplotlib.patches.Wedge at 0x1a72ef697b8>,
  <matplotlib.patches.Wedge at 0x1a72ef69cf8>],
 [Text(0.996424,0.465981,'a'),
  Text(-0.195798,1.08243,'b'),
  Text(-0.830021,-0.721848,'c'),
  Text(0.910034,-0.61793,'d')])

img

In [44]:

arr=[11,22,31,15]
plt.pie(arr,labels=['a','b','c','d'],labeldistance=0.3)

Out[44]:

([<matplotlib.patches.Wedge at 0x1a72efb18d0>,
  <matplotlib.patches.Wedge at 0x1a72efb1da0>,
  <matplotlib.patches.Wedge at 0x1a72efbb320>,
  <matplotlib.patches.Wedge at 0x1a72efbb860>],
 [Text(0.271752,0.127086,'a'),
  Text(-0.0533994,0.295209,'b'),
  Text(-0.226369,-0.196868,'c'),
  Text(0.248191,-0.168526,'d')])

img

In [45]:

arr=[11,22,31,15]
plt.pie(arr,labels=['a','b','c','d'],labeldistance=0.3,autopct='%.6f%%')

Out[45]:

([<matplotlib.patches.Wedge at 0x1a72f0024a8>,
  <matplotlib.patches.Wedge at 0x1a72f002ba8>,
  <matplotlib.patches.Wedge at 0x1a72f00a358>,
  <matplotlib.patches.Wedge at 0x1a72f00aac8>],
 [Text(0.271752,0.127086,'a'),
  Text(-0.0533994,0.295209,'b'),
  Text(-0.226369,-0.196868,'c'),
  Text(0.248191,-0.168526,'d')],
 [Text(0.543504,0.254171,'13.924050%'),
  Text(-0.106799,0.590419,'27.848101%'),
  Text(-0.452739,-0.393735,'39.240506%'),
  Text(0.496382,-0.337053,'18.987341%')])

img

In [46]:

arr=[11,22,31,15]
plt.pie(arr,labels=['a','b','c','d'],labeldistance=0.3,shadow=True,explode=[0.2,0.3,0.2,0.4])

Out[46]:

([<matplotlib.patches.Wedge at 0x1a72f04e940>,
  <matplotlib.patches.Wedge at 0x1a72f056128>,
  <matplotlib.patches.Wedge at 0x1a72f056940>,
  <matplotlib.patches.Wedge at 0x1a72f062198>],
 [Text(0.45292,0.21181,'a'),
  Text(-0.106799,0.590419,'b'),
  Text(-0.377282,-0.328113,'c'),
  Text(0.579113,-0.393228,'d')])

img

散点图scatter()

In [49]:

x = np.array([1,2,3,4,5])
y = x ** 2
plt.scatter(x,y)

Out[49]:

<matplotlib.collections.PathCollection at 0x1a72f089438>

img

In [51]:

x = np.random.random(size=(20,))
y = np.random.random(size=(20,))
plt.scatter(x,y)

Out[51]:

<matplotlib.collections.PathCollection at 0x1a72fe59e10>

img

temp dist

Type Markdown and LaTeX: α2α2

Type Markdown and LaTeX: α2α2

项目需求

第一部分:数据类型处理

第二部分:按月数据分析

第三部分:用户个体消费数据分析

第四部分:用户消费行为分析

标签:数据分析,11,df,receipt,day04,contbr,Republican,Out
来源: https://www.cnblogs.com/bky20061005/p/12233244.html