其他分享
首页 > 其他分享> > kaggle住房预测项目——第1部分

kaggle住房预测项目——第1部分

作者:互联网

kaggle住房预测项目——第1部分

项目介绍

项目地址

项目目的

数据介绍

目标:预测每个房屋的销售价格是您的工作。对于测试集中的每个ID,您必须预测SalePrice变量的值。

评估指标

根据预测值的对数与观察到的销售价格的对数之间的均方根误差(RMSE)评估提交的内容。(记录日志意味着预测昂贵房屋和廉价房屋的错误将同等地影响结果。)

加载数据集

导入工具包,数据读取

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns


from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score

import warnings
warnings.filterwarnings('ignore')
#显示所有列
pd.set_option('display.max_columns', None)
#显示所有行
pd.set_option('display.max_rows', None)
#设置value的显示长度为100,默认为50
pd.set_option('max_colwidth',100)
data_sample_submission = pd.read_csv('./data/sample_submission.csv')
data_train = pd.read_csv('./data/train.csv')
data_test = pd.read_csv('./data/test.csv')

基本信息

data_sample_submission.head()
IdSalePrice
01461169277.052498
11462187758.393989
21463183583.683570
31464179317.477511
41465150730.079977
data_sample_submission.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Id         1459 non-null   int64  
 1   SalePrice  1459 non-null   float64
dtypes: float64(1), int64(1)
memory usage: 22.9 KB
data_train.head()
IdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilitiesLotConfigLandSlopeNeighborhoodCondition1Condition2BldgTypeHouseStyleOverallQualOverallCondYearBuiltYearRemodAddRoofStyleRoofMatlExterior1stExterior2ndMasVnrTypeMasVnrAreaExterQualExterCondFoundationBsmtQualBsmtCondBsmtExposureBsmtFinType1BsmtFinSF1BsmtFinType2BsmtFinSF2BsmtUnfSFTotalBsmtSFHeatingHeatingQCCentralAirElectrical1stFlrSF2ndFlrSFLowQualFinSFGrLivAreaBsmtFullBathBsmtHalfBathFullBathHalfBathBedroomAbvGrKitchenAbvGrKitchenQualTotRmsAbvGrdFunctionalFireplacesFireplaceQuGarageTypeGarageYrBltGarageFinishGarageCarsGarageAreaGarageQualGarageCondPavedDriveWoodDeckSFOpenPorchSFEnclosedPorch3SsnPorchScreenPorchPoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
0160RL65.08450PaveNaNRegLvlAllPubInsideGtlCollgCrNormNorm1Fam2Story7520032003GableCompShgVinylSdVinylSdBrkFace196.0GdTAPConcGdTANoGLQ706Unf0150856GasAExYSBrkr85685401710102131Gd8Typ0NaNAttchd2003.0RFn2548TATAY0610000NaNNaNNaN022008WDNormal208500
1220RL80.09600PaveNaNRegLvlAllPubFR2GtlVeenkerFeedrNorm1Fam1Story6819761976GableCompShgMetalSdMetalSdNone0.0TATACBlockGdTAGdALQ978Unf02841262GasAExYSBrkr1262001262012031TA6Typ1TAAttchd1976.0RFn2460TATAY29800000NaNNaNNaN052007WDNormal181500
2360RL68.011250PaveNaNIR1LvlAllPubInsideGtlCollgCrNormNorm1Fam2Story7520012002GableCompShgVinylSdVinylSdBrkFace162.0GdTAPConcGdTAMnGLQ486Unf0434920GasAExYSBrkr92086601786102131Gd6Typ1TAAttchd2001.0RFn2608TATAY0420000NaNNaNNaN092008WDNormal223500
3470RL60.09550PaveNaNIR1LvlAllPubCornerGtlCrawforNormNorm1Fam2Story7519151970GableCompShgWd SdngWd ShngNone0.0TATABrkTilTAGdNoALQ216Unf0540756GasAGdYSBrkr96175601717101031Gd7Typ1GdDetchd1998.0Unf3642TATAY035272000NaNNaNNaN022006WDAbnorml140000
4560RL84.014260PaveNaNIR1LvlAllPubFR2GtlNoRidgeNormNorm1Fam2Story8520002000GableCompShgVinylSdVinylSdBrkFace350.0GdTAPConcGdTAAvGLQ655Unf04901145GasAExYSBrkr1145105302198102141Gd9Typ1TAAttchd2000.0RFn3836TATAY192840000NaNNaNNaN0122008WDNormal250000
data_train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallCond    1460 non-null   int64  
 19  YearBuilt      1460 non-null   int64  
 20  YearRemodAdd   1460 non-null   int64  
 21  RoofStyle      1460 non-null   object 
 22  RoofMatl       1460 non-null   object 
 23  Exterior1st    1460 non-null   object 
 24  Exterior2nd    1460 non-null   object 
 25  MasVnrType     1452 non-null   object 
 26  MasVnrArea     1452 non-null   float64
 27  ExterQual      1460 non-null   object 
 28  ExterCond      1460 non-null   object 
 29  Foundation     1460 non-null   object 
 30  BsmtQual       1423 non-null   object 
 31  BsmtCond       1423 non-null   object 
 32  BsmtExposure   1422 non-null   object 
 33  BsmtFinType1   1423 non-null   object 
 34  BsmtFinSF1     1460 non-null   int64  
 35  BsmtFinType2   1422 non-null   object 
 36  BsmtFinSF2     1460 non-null   int64  
 37  BsmtUnfSF      1460 non-null   int64  
 38  TotalBsmtSF    1460 non-null   int64  
 39  Heating        1460 non-null   object 
 40  HeatingQC      1460 non-null   object 
 41  CentralAir     1460 non-null   object 
 42  Electrical     1459 non-null   object 
 43  1stFlrSF       1460 non-null   int64  
 44  2ndFlrSF       1460 non-null   int64  
 45  LowQualFinSF   1460 non-null   int64  
 46  GrLivArea      1460 non-null   int64  
 47  BsmtFullBath   1460 non-null   int64  
 48  BsmtHalfBath   1460 non-null   int64  
 49  FullBath       1460 non-null   int64  
 50  HalfBath       1460 non-null   int64  
 51  BedroomAbvGr   1460 non-null   int64  
 52  KitchenAbvGr   1460 non-null   int64  
 53  KitchenQual    1460 non-null   object 
 54  TotRmsAbvGrd   1460 non-null   int64  
 55  Functional     1460 non-null   object 
 56  Fireplaces     1460 non-null   int64  
 57  FireplaceQu    770 non-null    object 
 58  GarageType     1379 non-null   object 
 59  GarageYrBlt    1379 non-null   float64
 60  GarageFinish   1379 non-null   object 
 61  GarageCars     1460 non-null   int64  
 62  GarageArea     1460 non-null   int64  
 63  GarageQual     1379 non-null   object 
 64  GarageCond     1379 non-null   object 
 65  PavedDrive     1460 non-null   object 
 66  WoodDeckSF     1460 non-null   int64  
 67  OpenPorchSF    1460 non-null   int64  
 68  EnclosedPorch  1460 non-null   int64  
 69  3SsnPorch      1460 non-null   int64  
 70  ScreenPorch    1460 non-null   int64  
 71  PoolArea       1460 non-null   int64  
 72  PoolQC         7 non-null      object 
 73  Fence          281 non-null    object 
 74  MiscFeature    54 non-null     object 
 75  MiscVal        1460 non-null   int64  
 76  MoSold         1460 non-null   int64  
 77  YrSold         1460 non-null   int64  
 78  SaleType       1460 non-null   object 
 79  SaleCondition  1460 non-null   object 
 80  SalePrice      1460 non-null   int64  
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB
data_test.head()
IdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilitiesLotConfigLandSlopeNeighborhoodCondition1Condition2BldgTypeHouseStyleOverallQualOverallCondYearBuiltYearRemodAddRoofStyleRoofMatlExterior1stExterior2ndMasVnrTypeMasVnrAreaExterQualExterCondFoundationBsmtQualBsmtCondBsmtExposureBsmtFinType1BsmtFinSF1BsmtFinType2BsmtFinSF2BsmtUnfSFTotalBsmtSFHeatingHeatingQCCentralAirElectrical1stFlrSF2ndFlrSFLowQualFinSFGrLivAreaBsmtFullBathBsmtHalfBathFullBathHalfBathBedroomAbvGrKitchenAbvGrKitchenQualTotRmsAbvGrdFunctionalFireplacesFireplaceQuGarageTypeGarageYrBltGarageFinishGarageCarsGarageAreaGarageQualGarageCondPavedDriveWoodDeckSFOpenPorchSFEnclosedPorch3SsnPorchScreenPorchPoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleCondition
0146120RH80.011622PaveNaNRegLvlAllPubInsideGtlNAmesFeedrNorm1Fam1Story5619611961GableCompShgVinylSdVinylSdNone0.0TATACBlockTATANoRec468.0LwQ144.0270.0882.0GasATAYSBrkr896008960.00.01021TA5Typ0NaNAttchd1961.0Unf1.0730.0TATAY1400001200NaNMnPrvNaN062010WDNormal
1146220RL81.014267PaveNaNIR1LvlAllPubCornerGtlNAmesNormNorm1Fam1Story6619581958HipCompShgWd SdngWd SdngBrkFace108.0TATACBlockTATANoALQ923.0Unf0.0406.01329.0GasATAYSBrkr13290013290.00.01131Gd6Typ0NaNAttchd1958.0Unf1.0312.0TATAY393360000NaNNaNGar21250062010WDNormal
2146360RL74.013830PaveNaNIR1LvlAllPubInsideGtlGilbertNormNorm1Fam2Story5519971998GableCompShgVinylSdVinylSdNone0.0TATAPConcGdTANoGLQ791.0Unf0.0137.0928.0GasAGdYSBrkr928701016290.00.02131TA6Typ1TAAttchd1997.0Fin2.0482.0TATAY212340000NaNMnPrvNaN032010WDNormal
3146460RL78.09978PaveNaNIR1LvlAllPubInsideGtlGilbertNormNorm1Fam2Story6619981998GableCompShgVinylSdVinylSdBrkFace20.0TATAPConcTATANoGLQ602.0Unf0.0324.0926.0GasAExYSBrkr926678016040.00.02131Gd7Typ1GdAttchd1998.0Fin2.0470.0TATAY360360000NaNNaNNaN062010WDNormal
41465120RL43.05005PaveNaNIR1HLSAllPubInsideGtlStoneBrNormNormTwnhsE1Story8519921992GableCompShgHdBoardHdBoardNone0.0GdTAPConcGdTANoALQ263.0Unf0.01017.01280.0GasAExYSBrkr12800012800.00.02021Gd5Typ0NaNAttchd1992.0RFn2.0506.0TATAY082001440NaNNaNNaN012010WDNormal
data_test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 80 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1459 non-null   int64  
 1   MSSubClass     1459 non-null   int64  
 2   MSZoning       1455 non-null   object 
 3   LotFrontage    1232 non-null   float64
 4   LotArea        1459 non-null   int64  
 5   Street         1459 non-null   object 
 6   Alley          107 non-null    object 
 7   LotShape       1459 non-null   object 
 8   LandContour    1459 non-null   object 
 9   Utilities      1457 non-null   object 
 10  LotConfig      1459 non-null   object 
 11  LandSlope      1459 non-null   object 
 12  Neighborhood   1459 non-null   object 
 13  Condition1     1459 non-null   object 
 14  Condition2     1459 non-null   object 
 15  BldgType       1459 non-null   object 
 16  HouseStyle     1459 non-null   object 
 17  OverallQual    1459 non-null   int64  
 18  OverallCond    1459 non-null   int64  
 19  YearBuilt      1459 non-null   int64  
 20  YearRemodAdd   1459 non-null   int64  
 21  RoofStyle      1459 non-null   object 
 22  RoofMatl       1459 non-null   object 
 23  Exterior1st    1458 non-null   object 
 24  Exterior2nd    1458 non-null   object 
 25  MasVnrType     1443 non-null   object 
 26  MasVnrArea     1444 non-null   float64
 27  ExterQual      1459 non-null   object 
 28  ExterCond      1459 non-null   object 
 29  Foundation     1459 non-null   object 
 30  BsmtQual       1415 non-null   object 
 31  BsmtCond       1414 non-null   object 
 32  BsmtExposure   1415 non-null   object 
 33  BsmtFinType1   1417 non-null   object 
 34  BsmtFinSF1     1458 non-null   float64
 35  BsmtFinType2   1417 non-null   object 
 36  BsmtFinSF2     1458 non-null   float64
 37  BsmtUnfSF      1458 non-null   float64
 38  TotalBsmtSF    1458 non-null   float64
 39  Heating        1459 non-null   object 
 40  HeatingQC      1459 non-null   object 
 41  CentralAir     1459 non-null   object 
 42  Electrical     1459 non-null   object 
 43  1stFlrSF       1459 non-null   int64  
 44  2ndFlrSF       1459 non-null   int64  
 45  LowQualFinSF   1459 non-null   int64  
 46  GrLivArea      1459 non-null   int64  
 47  BsmtFullBath   1457 non-null   float64
 48  BsmtHalfBath   1457 non-null   float64
 49  FullBath       1459 non-null   int64  
 50  HalfBath       1459 non-null   int64  
 51  BedroomAbvGr   1459 non-null   int64  
 52  KitchenAbvGr   1459 non-null   int64  
 53  KitchenQual    1458 non-null   object 
 54  TotRmsAbvGrd   1459 non-null   int64  
 55  Functional     1457 non-null   object 
 56  Fireplaces     1459 non-null   int64  
 57  FireplaceQu    729 non-null    object 
 58  GarageType     1383 non-null   object 
 59  GarageYrBlt    1381 non-null   float64
 60  GarageFinish   1381 non-null   object 
 61  GarageCars     1458 non-null   float64
 62  GarageArea     1458 non-null   float64
 63  GarageQual     1381 non-null   object 
 64  GarageCond     1381 non-null   object 
 65  PavedDrive     1459 non-null   object 
 66  WoodDeckSF     1459 non-null   int64  
 67  OpenPorchSF    1459 non-null   int64  
 68  EnclosedPorch  1459 non-null   int64  
 69  3SsnPorch      1459 non-null   int64  
 70  ScreenPorch    1459 non-null   int64  
 71  PoolArea       1459 non-null   int64  
 72  PoolQC         3 non-null      object 
 73  Fence          290 non-null    object 
 74  MiscFeature    51 non-null     object 
 75  MiscVal        1459 non-null   int64  
 76  MoSold         1459 non-null   int64  
 77  YrSold         1459 non-null   int64  
 78  SaleType       1458 non-null   object 
 79  SaleCondition  1459 non-null   object 
dtypes: float64(11), int64(26), object(43)
memory usage: 912.0+ KB
data_train.describe()
IdMSSubClassLotFrontageLotAreaOverallQualOverallCondYearBuiltYearRemodAddMasVnrAreaBsmtFinSF1BsmtFinSF2BsmtUnfSFTotalBsmtSF1stFlrSF2ndFlrSFLowQualFinSFGrLivAreaBsmtFullBathBsmtHalfBathFullBathHalfBathBedroomAbvGrKitchenAbvGrTotRmsAbvGrdFireplacesGarageYrBltGarageCarsGarageAreaWoodDeckSFOpenPorchSFEnclosedPorch3SsnPorchScreenPorchPoolAreaMiscValMoSoldYrSoldSalePrice
count1460.0000001460.0000001201.0000001460.0000001460.0000001460.0000001460.0000001460.0000001452.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001379.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.0000001460.000000
mean730.50000056.89726070.04995810516.8280826.0993155.5753421971.2678081984.865753103.685262443.63972646.549315567.2404111057.4294521162.626712346.9924665.8445211515.4636990.4253420.0575341.5650680.3828772.8664381.0465756.5178080.6130141978.5061641.767123472.98013794.24452146.66027421.9541103.40958915.0609592.75890443.4890416.3219182007.815753180921.195890
std421.61000942.30057124.2847529981.2649321.3829971.11279930.20290420.645407181.066207456.098091161.319273441.866955438.705324386.587738436.52843648.623081525.4803830.5189110.2387530.5509160.5028850.8157780.2203381.6253930.64466624.6897250.747315213.804841125.33879466.25602861.11914929.31733155.75741540.177307496.1230242.7036261.32809579442.502883
min1.00000020.00000021.0000001300.0000001.0000001.0000001872.0000001950.0000000.0000000.0000000.0000000.0000000.000000334.0000000.0000000.000000334.0000000.0000000.0000000.0000000.0000000.0000000.0000002.0000000.0000001900.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000001.0000002006.00000034900.000000
25%365.75000020.00000059.0000007553.5000005.0000005.0000001954.0000001967.0000000.0000000.0000000.000000223.000000795.750000882.0000000.0000000.0000001129.5000000.0000000.0000001.0000000.0000002.0000001.0000005.0000000.0000001961.0000001.000000334.5000000.0000000.0000000.0000000.0000000.0000000.0000000.0000005.0000002007.000000129975.000000
50%730.50000050.00000069.0000009478.5000006.0000005.0000001973.0000001994.0000000.000000383.5000000.000000477.500000991.5000001087.0000000.0000000.0000001464.0000000.0000000.0000002.0000000.0000003.0000001.0000006.0000001.0000001980.0000002.000000480.0000000.00000025.0000000.0000000.0000000.0000000.0000000.0000006.0000002008.000000163000.000000
75%1095.25000070.00000080.00000011601.5000007.0000006.0000002000.0000002004.000000166.000000712.2500000.000000808.0000001298.2500001391.250000728.0000000.0000001776.7500001.0000000.0000002.0000001.0000003.0000001.0000007.0000001.0000002002.0000002.000000576.000000168.00000068.0000000.0000000.0000000.0000000.0000000.0000008.0000002009.000000214000.000000
max1460.000000190.000000313.000000215245.00000010.0000009.0000002010.0000002010.0000001600.0000005644.0000001474.0000002336.0000006110.0000004692.0000002065.000000572.0000005642.0000003.0000002.0000003.0000002.0000008.0000003.00000014.0000003.0000002010.0000004.0000001418.000000857.000000547.000000552.000000508.000000480.000000738.00000015500.00000012.0000002010.000000755000.000000
data_train.head()
IdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilitiesLotConfigLandSlopeNeighborhoodCondition1Condition2BldgTypeHouseStyleOverallQualOverallCondYearBuiltYearRemodAddRoofStyleRoofMatlExterior1stExterior2ndMasVnrTypeMasVnrAreaExterQualExterCondFoundationBsmtQualBsmtCondBsmtExposureBsmtFinType1BsmtFinSF1BsmtFinType2BsmtFinSF2BsmtUnfSFTotalBsmtSFHeatingHeatingQCCentralAirElectrical1stFlrSF2ndFlrSFLowQualFinSFGrLivAreaBsmtFullBathBsmtHalfBathFullBathHalfBathBedroomAbvGrKitchenAbvGrKitchenQualTotRmsAbvGrdFunctionalFireplacesFireplaceQuGarageTypeGarageYrBltGarageFinishGarageCarsGarageAreaGarageQualGarageCondPavedDriveWoodDeckSFOpenPorchSFEnclosedPorch3SsnPorchScreenPorchPoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
0160RL65.08450PaveNaNRegLvlAllPubInsideGtlCollgCrNormNorm1Fam2Story7520032003GableCompShgVinylSdVinylSdBrkFace196.0GdTAPConcGdTANoGLQ706Unf0150856GasAExYSBrkr85685401710102131Gd8Typ0NaNAttchd2003.0RFn2548TATAY0610000NaNNaNNaN022008WDNormal208500
1220RL80.09600PaveNaNRegLvlAllPubFR2GtlVeenkerFeedrNorm1Fam1Story6819761976GableCompShgMetalSdMetalSdNone0.0TATACBlockGdTAGdALQ978Unf02841262GasAExYSBrkr1262001262012031TA6Typ1TAAttchd1976.0RFn2460TATAY29800000NaNNaNNaN052007WDNormal181500
2360RL68.011250PaveNaNIR1LvlAllPubInsideGtlCollgCrNormNorm1Fam2Story7520012002GableCompShgVinylSdVinylSdBrkFace162.0GdTAPConcGdTAMnGLQ486Unf0434920GasAExYSBrkr92086601786102131Gd6Typ1TAAttchd2001.0RFn2608TATAY0420000NaNNaNNaN092008WDNormal223500
3470RL60.09550PaveNaNIR1LvlAllPubCornerGtlCrawforNormNorm1Fam2Story7519151970GableCompShgWd SdngWd ShngNone0.0TATABrkTilTAGdNoALQ216Unf0540756GasAGdYSBrkr96175601717101031Gd7Typ1GdDetchd1998.0Unf3642TATAY035272000NaNNaNNaN022006WDAbnorml140000
4560RL84.014260PaveNaNIR1LvlAllPubFR2GtlNoRidgeNormNorm1Fam2Story8520002000GableCompShgVinylSdVinylSdBrkFace350.0GdTAPConcGdTAAvGLQ655Unf04901145GasAExYSBrkr1145105302198102141Gd9Typ1TAAttchd2000.0RFn3836TATAY192840000NaNNaNNaN0122008WDNormal250000
data_train.shape
(1460, 81)
data_test.shape
(1459, 80)

探索性数据分析(EDA)

数据缺失情况

# 缺失情况函数
def missing_data(data):
    total = data.isnull().sum().sort_values(ascending = False)
    percent = (data.isnull().sum()/data.isnull().count()*100).sort_values(ascending = False)
    return pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_data(data_train)
TotalPercent
PoolQC145399.520548
MiscFeature140696.301370
Alley136993.767123
Fence117980.753425
FireplaceQu69047.260274
LotFrontage25917.739726
GarageCond815.547945
GarageType815.547945
GarageYrBlt815.547945
GarageFinish815.547945
GarageQual815.547945
BsmtExposure382.602740
BsmtFinType2382.602740
BsmtFinType1372.534247
BsmtCond372.534247
BsmtQual372.534247
MasVnrArea80.547945
MasVnrType80.547945
Electrical10.068493
Utilities00.000000
YearRemodAdd00.000000
MSSubClass00.000000
Foundation00.000000
ExterCond00.000000
ExterQual00.000000
Exterior2nd00.000000
Exterior1st00.000000
RoofMatl00.000000
RoofStyle00.000000
YearBuilt00.000000
LotConfig00.000000
OverallCond00.000000
OverallQual00.000000
HouseStyle00.000000
BldgType00.000000
Condition200.000000
BsmtFinSF100.000000
MSZoning00.000000
LotArea00.000000
Street00.000000
Condition100.000000
Neighborhood00.000000
LotShape00.000000
LandContour00.000000
LandSlope00.000000
SalePrice00.000000
HeatingQC00.000000
BsmtFinSF200.000000
EnclosedPorch00.000000
Fireplaces00.000000
GarageCars00.000000
GarageArea00.000000
PavedDrive00.000000
WoodDeckSF00.000000
OpenPorchSF00.000000
3SsnPorch00.000000
BsmtUnfSF00.000000
ScreenPorch00.000000
PoolArea00.000000
MiscVal00.000000
MoSold00.000000
YrSold00.000000
SaleType00.000000
Functional00.000000
TotRmsAbvGrd00.000000
KitchenQual00.000000
KitchenAbvGr00.000000
BedroomAbvGr00.000000
HalfBath00.000000
FullBath00.000000
BsmtHalfBath00.000000
BsmtFullBath00.000000
GrLivArea00.000000
LowQualFinSF00.000000
2ndFlrSF00.000000
1stFlrSF00.000000
CentralAir00.000000
SaleCondition00.000000
Heating00.000000
TotalBsmtSF00.000000
Id00.000000

探索特征

# 离散数据
def lisan_plot(column, data):
    fig = plt.figure(figsize=(10,4))
    plt.subplot2grid((1,2),(0,0))
    sns.barplot(x=data[column].value_counts().index, y=data[column].value_counts().values)
    plt.title(column)
    plt.ylabel('数量')
    
    plt.subplot2grid((1,2),(0,1))
    sns.boxplot(x=column, y='SalePrice', data=data)
    
# 连续数据
def lianxu_plot(column, data):
    fig = plt.figure(figsize=(10,4))
    plt.subplot2grid((1,2),(0,0))
    sns.distplot(data[column].dropna())
    plt.xlabel(column)
    plt.ylabel('数量')
    
    plt.subplot2grid((1,2),(0,1))
    sns.scatterplot(data[column].dropna(), data['SalePrice'])
    plt.show()
1.MSSubClass:

Identifies the type of dwelling involved in the sale.标识出售中涉及的住宅类型。

column = 'MSSubClass'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
15
20     536
60     299
50     144
120     87
30      69
160     63
70      60
80      58
90      52
190     30
85      20
75      16
45      12
180     10
40       4
Name: MSSubClass, dtype: int64

output_26_1.png

2.MSZoning:

Identifies the general zoning classification of the sale.确定销售的一般分区分类。

column = 'MSZoning'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
RL         1151
RM          218
FV           65
RH           16
C (all)      10
Name: MSZoning, dtype: int64

output_29_1.png

3.LotFrontage:

Linear feet of street connected to property临街:与物业相连的线性的几英尺的街道

column = 'LotFrontage'
print(len(data_train[column].unique()))
print('最大值和最小值:',data_train[column].max(), data_train[column].min())
print(data_train[column].unique())
lianxu_plot(column, data_train)
111
最大值和最小值: 313.0 21.0
[ 65.  80.  68.  60.  84.  85.  75.  nan  51.  50.  70.  91.  72.  66.
 101.  57.  44. 110.  98.  47. 108. 112.  74. 115.  61.  48.  33.  52.
 100.  24.  89.  63.  76.  81.  95.  69.  21.  32.  78. 121. 122.  40.
 105.  73.  77.  64.  94.  34.  90.  55.  88.  82.  71. 120. 107.  92.
 134.  62.  86. 141.  97.  54.  41.  79. 174.  99.  67.  83.  43. 103.
  93.  30. 129. 140.  35.  37. 118.  87. 116. 150. 111.  49.  96.  59.
  36.  56. 102.  58.  38. 109. 130.  53. 137.  45. 106. 104.  42.  39.
 144. 114. 128. 149. 313. 168. 182. 138. 160. 152. 124. 153.  46.]

output_32_1.png
可能是异常值:data[data[‘LotFrontage’] > 300]

4.LotArea:

Lot size in square feet地块面积(平方英尺)

column = 'LotArea'
print(len(data_train[column].unique()))
print('最大值和最小值:',data_train[column].max(), data_train[column].min())
print(data_train[column].unique())
lianxu_plot(column, data_train)
1073
最大值和最小值: 215245 1300
[ 8450  9600 11250 ... 17217 13175  9717]

output_35_1.png
可能是异常值:data[data[‘LotArea’] > 100000]

5.Street:

Type of road access to property街道:进入物业的道路类型

column = 'Street'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
2
Pave    1454
Grvl       6
Name: Street, dtype: int64

output_38_1.png

6.Alley:

Type of alley access to property小巷:通向财产的小巷的类型

column = 'Alley'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
Grvl    50
Pave    41
Name: Alley, dtype: int64

output_41_1.png

7.LotShape:

General shape of property一般形状

   Reg	Regular	常规的
   IR1	Slightly irregular 轻微的不规则
   IR2	Moderately Irregular 适度的不规则
   IR3	Irregular不规则
column = 'LotShape'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Reg    925
IR1    484
IR2     41
IR3     10
Name: LotShape, dtype: int64

output_43_1.png

8.LandContour:

Flatness of the property 平坦程度

   Lvl	Near Flat/Level	近平/水平
   Bnk	Banked - Quick and significant rise from street grade to building 有坡面的-快速而显著地从街道等级上升到建筑等级
   HLS	Hillside - Significant slope from side to side山坡-显著的从一边到另一边的斜坡
   Low	Depression洼地;凹地	
column = 'LandContour'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Lvl    1311
Bnk      63
HLS      50
Low      36
Name: LandContour, dtype: int64

output_45_1.png

9.Utilities:

Type of utilities available可用的公共设备类型

   AllPub	All public Utilities (E,G,W,& S)	 所有公用事业(如,G,W, S)
   NoSewr	Electricity, Gas, and Water (Septic Tank) 电、气、水(化粪池)
   NoSeWa	Electricity and Gas Only只提供电力及煤气
   ELO	Electricity only	
column = 'Utilities'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
2
AllPub    1459
NoSeWa       1
Name: Utilities, dtype: int64

output_47_1.png

10.LotConfig:

Lot configuration 批量配置

   Inside 	Inside lot里面
   Corner	Corner lot角落
   CulDSac	Cul-de-sac死胡同
   FR2	Frontage on 2 sides of property房屋两面的正面
   FR3	Frontage on 3 sides of property三面房屋的正面
column = 'LotConfig'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
Inside     1052
Corner      263
CulDSac      94
FR2          47
FR3           4
Name: LotConfig, dtype: int64

output_49_1.png

11.LandSlope:

Slope of property斜坡

   Gtl	Gentle slope缓坡
   Mod	Moderate Slope	温和的斜坡
   Sev	Severe Slope严重的斜坡
column = 'LandSlope'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
Gtl    1382
Mod      65
Sev      13
Name: LandSlope, dtype: int64

output_51_1.png

12.Neighborhood:

Physical locations within Ames city limits邻居:在艾姆斯城市范围内的物理位置

   Blmngtn	Bloomington Heights
   Blueste	Bluestem
   BrDale	Briardale
   BrkSide	Brookside
   ClearCr	Clear Creek
   CollgCr	College Creek
   Crawfor	Crawford
   Edwards	Edwards
   Gilbert	Gilbert
   IDOTRR	Iowa DOT and Rail Road
   MeadowV	Meadow Village
   Mitchel	Mitchell
   Names	North Ames
   NoRidge	Northridge
   NPkVill	Northpark Villa
   NridgHt	Northridge Heights
   NWAmes	Northwest Ames
   OldTown	Old Town
   SWISU	South & West of Iowa State University
   Sawyer	Sawyer
   SawyerW	Sawyer West
   Somerst	Somerset
   StoneBr	Stone Brook
   Timber	Timberland
   Veenker	Veenker
column = 'Neighborhood'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
25
NAmes      225
CollgCr    150
OldTown    113
Edwards    100
Somerst     86
Gilbert     79
NridgHt     77
Sawyer      74
NWAmes      73
SawyerW     59
BrkSide     58
Crawfor     51
Mitchel     49
NoRidge     41
Timber      38
IDOTRR      37
ClearCr     28
StoneBr     25
SWISU       25
Blmngtn     17
MeadowV     17
BrDale      16
Veenker     11
NPkVill      9
Blueste      2
Name: Neighborhood, dtype: int64

53

13.Condition1:

Proximity to various conditions接近各种条件

   Artery	Adjacent to arterial street毗邻主干道
   Feedr	Adjacent to feeder street毗邻支线街	
   Norm	Normal	
   RRNn	Within 200' of North-South Railroad距离南北铁路200英尺以内
   RRAn	Adjacent to North-South Railroad紧邻南北铁路
   PosN	Near positive off-site feature--park, greenbelt, etc.近正场外特征——公园、绿地等。
   PosA	Adjacent to postive off-site feature与非现场特征相邻
   RRNe	Within 200' of East-West Railroad距离东西铁路200英尺的地方
   RRAe	Adjacent to East-West Railroad毗邻东西铁路
column = 'Condition1'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
9
Norm      1260
Feedr       81
Artery      48
RRAn        26
PosN        19
RRAe        11
PosA         8
RRNn         5
RRNe         2
Name: Condition1, dtype: int64

55

14.Condition2:

Proximity to various conditions (if more than one is present)接近各种条件(如果存在多于一个)

   Artery	Adjacent to arterial street
   Feedr 	Adjacent to feeder street	
   Norm	 Normal	
   RRNn	 Within 200' of North-South Railroad
   RRAn	 Adjacent to North-South Railroad
   PosN	 Near positive off-site feature--park, greenbelt, etc.
   PosA	 Adjacent to postive off-site feature
   RRNe	Within 200' of East-West Railroad
   RRAe	Adjacent to East-West Railroad
column = 'Condition2'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
Norm      1445
Feedr        6
Artery       2
PosN         2
RRNn         2
PosA         1
RRAn         1
RRAe         1
Name: Condition2, dtype: int64

57

15.BldgType:

Type of dwelling住宅类型

   1Fam	Single-family Detached	独栋独立式
   2FmCon	Two-family Conversion; originally built as one-family dwelling两家合住的转换;最初是作为一户住宅建造的
   Duplx	Duplex双工
   TwnhsE	Townhouse End Unit联排别墅结束单元
   TwnhsI	Townhouse Inside Unit联排别墅内部单位
column = 'BldgType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
1Fam      1220
TwnhsE     114
Duplex      52
Twnhs       43
2fmCon      31
Name: BldgType, dtype: int64

59

16.HouseStyle:

Style of dwelling住宅风格

   1Story	One story
   1.5Fin	One and one-half story: 2nd level finished
   1.5Unf	One and one-half story: 2nd level unfinished
   2Story	Two story
   2.5Fin	Two and one-half story: 2nd level finished
   2.5Unf	Two and one-half story: 2nd level unfinished
   SFoyer	Split Foyer
   SLvl	Split Level
column = 'HouseStyle'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
1Story    726
2Story    445
1.5Fin    154
SLvl       65
SFoyer     37
1.5Unf     14
2.5Unf     11
2.5Fin      8
Name: HouseStyle, dtype: int64

61

17.OverallQual:

Rates the overall material and finish of the house
总体质量:评估房屋的整体材料和装饰

   10	Very Excellent
   9	Excellent
   8	Very Good
   7	Good
   6	Above Average
   5	Average
   4	Below Average
   3	Fair
   2	Poor
   1	Very Poor
column = 'OverallQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
10
5     397
6     374
7     319
8     168
4     116
9      43
3      20
10     18
2       3
1       2
Name: OverallQual, dtype: int64

63

18.OverallCond:

Rates the overall condition of the house对房子的整体状况进行评估

   10	Very Excellent
   9	Excellent
   8	Very Good
   7	Good
   6	Above Average	
   5	Average
   4	Below Average	
   3	Fair
   2	Poor
   1	Very Poor
column = 'OverallCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
9
5    821
6    252
7    205
8     72
4     57
3     25
9     22
2      5
1      1
Name: OverallCond, dtype: int64

65

19.YearBuilt:

Original construction date原始施工日期

column = 'YearBuilt'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
112
2006    67
2005    64
2004    54
2007    49
2003    45
1976    33
1977    32
1920    30
1959    26
1999    25
1998    25
1958    24
1965    24
1970    24
1954    24
2000    24
2002    23
2008    23
1972    23
1968    22
1971    22
1950    20
2001    20
1957    20
1962    19
1994    19
1966    18
2009    18
1995    18
1940    18
1910    17
1960    17
1993    17
1978    16
1955    16
1925    16
1963    16
1967    16
1996    15
1941    15
1964    15
1969    14
1956    14
1961    14
1997    14
1948    14
1992    13
1990    12
1953    12
1949    12
1988    11
1973    11
1915    10
1900    10
1980    10
1974    10
1979     9
1926     9
1930     9
1936     9
1984     9
1939     8
1922     8
1975     8
1916     8
1924     7
1928     7
1918     7
1914     7
1923     7
1946     7
1935     6
1945     6
1931     6
1982     6
1921     6
1951     6
1985     5
1937     5
1947     5
1991     5
1981     5
1986     5
1952     5
1880     4
1929     4
1932     4
1938     4
1983     4
1927     3
1919     3
1934     3
1989     3
1987     3
1912     3
1885     2
1892     2
1890     2
1942     2
1908     2
1882     1
1875     1
1893     1
2010     1
1898     1
1904     1
1905     1
1906     1
1911     1
1913     1
1917     1
1872     1
Name: YearBuilt, dtype: int64

67

column = 'YearBuilt'
lianxu_plot(column, data_train)

68

20.YearRemodAdd:

Remodel date (same as construction date if no remodeling or additions)
改型日期(如无改型或加建,则与建造日期相同)

column = 'YearRemodAdd'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
61
1950    178
2006     97
2007     76
2005     73
2004     62
2000     55
2003     51
2002     48
2008     40
1996     36
1998     36
1995     31
1976     30
1999     30
1970     26
1997     25
1977     25
2009     23
1994     22
2001     21
1972     20
1965     19
1993     19
1971     18
1959     18
1968     17
1992     17
1978     16
1966     15
1958     15
1990     15
1962     14
1954     14
1969     14
1991     14
1963     13
1960     12
1967     12
1980     12
1973     11
1964     11
1989     11
1987     10
1975     10
1979     10
1956     10
1953     10
1957      9
1988      9
1955      9
1985      9
1961      8
1981      8
1974      7
1982      7
1984      7
2010      6
1983      5
1952      5
1986      5
1951      4
Name: YearRemodAdd, dtype: int64

70

column = 'YearRemodAdd'
lianxu_plot(column, data_train)

71

21.RoofStyle:

Type of roof屋顶类型

   Flat	Flat
   Gable	Gable
   Gambrel	Gabrel (Barn)
   Hip	Hip
   Mansard	Mansard
   Shed	Shed
column = 'RoofStyle'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
Gable      1141
Hip         286
Flat         13
Gambrel      11
Mansard       7
Shed          2
Name: RoofStyle, dtype: int64

73

22.RoofMatl:

Roof material屋顶材料

   ClyTile	Clay or Tile
   CompShg	Standard (Composite) Shingle
   Membran	Membrane
   Metal	Metal
   Roll	Roll
   Tar&Grv	Gravel & Tar
   WdShake	Wood Shakes
   WdShngl	Wood Shingles
column = 'RoofMatl'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
CompShg    1434
Tar&Grv      11
WdShngl       6
WdShake       5
Metal         1
Roll          1
Membran       1
ClyTile       1
Name: RoofMatl, dtype: int64

75

23.Exterior1st:

Exterior covering on house房屋外盖

   AsbShng	Asbestos Shingles
   AsphShn	Asphalt Shingles沥青瓦
   BrkComm	Brick Common
   BrkFace	Brick Face砖面
   CBlock	Cinder Block煤渣砖
   CemntBd	Cement Board
   HdBoard	Hard Board
   ImStucc	Imitation Stucco
   MetalSd	Metal Siding
   Other	Other
   Plywood	Plywood
   PreCast	PreCast	
   Stone	Stone
   Stucco	Stucco
   VinylSd	Vinyl Siding
   Wd Sdng	Wood Siding
   WdShing	Wood Shingles
column = 'Exterior1st'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
15
VinylSd    515
HdBoard    222
MetalSd    220
Wd Sdng    206
Plywood    108
CemntBd     61
BrkFace     50
WdShing     26
Stucco      25
AsbShng     20
BrkComm      2
Stone        2
AsphShn      1
CBlock       1
ImStucc      1
Name: Exterior1st, dtype: int64

77

24.Exterior2nd:

Exterior covering on house (if more than one material)
房屋外部覆盖物(如果多于一种材料)

   AsbShng	Asbestos Shingles
   AsphShn	Asphalt Shingles
   BrkComm	Brick Common
   BrkFace	Brick Face
   CBlock	Cinder Block
   CemntBd	Cement Board
   HdBoard	Hard Board
   ImStucc	Imitation Stucco
   MetalSd	Metal Siding
   Other	Other
   Plywood	Plywood
   PreCast	PreCast
   Stone	Stone
   Stucco	Stucco
   VinylSd	Vinyl Siding
   Wd Sdng	Wood Siding
   WdShing	Wood Shingles
column = 'Exterior2nd'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
16
VinylSd    504
MetalSd    214
HdBoard    207
Wd Sdng    197
Plywood    142
CmentBd     60
Wd Shng     38
Stucco      26
BrkFace     25
AsbShng     20
ImStucc     10
Brk Cmn      7
Stone        5
AsphShn      3
CBlock       1
Other        1
Name: Exterior2nd, dtype: int64

79

25.MasVnrType:

Masonry veneer type表层砌体类型

   BrkCmn	Brick Common
   BrkFace	Brick Face
   CBlock	Cinder Block
   None	None
   Stone	Stone
column = 'MasVnrType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
None       864
BrkFace    445
Stone      128
BrkCmn      15
Name: MasVnrType, dtype: int64

81

26.MasVnrArea:

Masonry veneer area in square feet砌体贴面面积,平方英尺

column = 'MasVnrArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
328
最小值和最大值: 0.0 1600.0

83

27.ExterQual:

Evaluates the quality of the material on the exterior
评估外部材料的质量

   Ex	Excellent
   Gd	Good
   TA	Average/Typical
   Fa	Fair
   Po	Poor
column = 'ExterQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
TA    906
Gd    488
Ex     52
Fa     14
Name: ExterQual, dtype: int64

85

28.ExterCond:

Evaluates the present condition of the material on the exterior评估外部材料的现状

   Ex	Excellent
   Gd	Good
   TA	Average/Typical
   Fa	Fair
   Po	Poor
column = 'ExterCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
TA    1282
Gd     146
Fa      28
Ex       3
Po       1
Name: ExterCond, dtype: int64

87

29.Foundation:

Type of foundation基础的类型

   BrkTil	Brick & Tile砖和瓦
   CBlock	Cinder Block煤渣砖
   PConc	Poured Contrete	
   Slab	Slab
   Stone	Stone
   Wood	Wood
column = 'Foundation'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
PConc     647
CBlock    634
BrkTil    146
Slab       24
Stone       6
Wood        3
Name: Foundation, dtype: int64

89

30.BsmtQual:

Evaluates the height of the basement.评估地下室的高度

   Ex	Excellent (100+ inches)	
   Gd	Good (90-99 inches)
   TA	Typical (80-89 inches)
   Fa	Fair (70-79 inches)
   Po	Poor (<70 inches
   NA	No Basement
column = 'BsmtQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
TA    649
Gd    618
Ex    121
Fa     35
Name: BsmtQual, dtype: int64

91

31.BsmtCond:

Evaluates the general condition of the basement
评估地下室的总体状况

   Ex	Excellent
   Gd	Good
   TA	Typical - slight dampness allowed
   Fa	Fair - dampness or some cracking or settling
   Po	Poor - Severe cracking, settling, or wetness
   NA	No Basement
column = 'BsmtCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
TA    1311
Gd      65
Fa      45
Po       2
Name: BsmtCond, dtype: int64

在这里插入图片描述

32.BsmtExposure:

Refers to walkout or garden level walls
指罢工的或花园水平的墙

   Gd	Good Exposure
   Av	Average Exposure (split levels or foyers typically score average or above)	
   Mn	Mimimum Exposure
   No	No Exposure
   NA	No Basement
column = 'BsmtExposure'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
No    953
Av    221
Gd    134
Mn    114
Name: BsmtExposure, dtype: int64

在这里插入图片描述

33.BsmtFinType1:

Rating of basement finished area地下室完工面积等级

   GLQ	Good Living Quarters
   ALQ	Average Living Quarters
   BLQ	Below Average Living Quarters	
   Rec	Average Rec Room
   LwQ	Low Quality
   Unf	Unfinshed
   NA	No Basement
column = 'BsmtFinType1'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Unf    430
GLQ    418
ALQ    220
BLQ    148
Rec    133
LwQ     74
Name: BsmtFinType1, dtype: int64

在这里插入图片描述

34.BsmtFinSF1:

Type 1 finished square feet一型成品平方英尺

column = 'BsmtFinSF1'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
637
最小值和最大值: 0 5644

在这里插入图片描述

异常值: >5000

35.BsmtFinType2:

Rating of basement finished area (if multiple types)
地下室完工面积等级(如多类型)

   GLQ	Good Living Quarters
   ALQ	Average Living Quarters
   BLQ	Below Average Living Quarters	
   Rec	Average Rec Room
   LwQ	Low Quality
   Unf	Unfinshed
   NA	No Basement
column = 'BsmtFinType2'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Unf    1256
Rec      54
LwQ      46
BLQ      33
ALQ      19
GLQ      14
Name: BsmtFinType2, dtype: int64

在这里插入图片描述

36.BsmtFinSF2:

Type 2 finished square feet
2型完成平方英尺

column = 'BsmtFinSF2'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
144
最小值和最大值: 0 1474

在这里插入图片描述

37.BsmtUnfSF:

Unfinished square feet of basement area
未完成的地下室面积

column = 'BsmtUnfSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
780
最小值和最大值: 0 2336

在这里插入图片描述

38.TotalBsmtSF:

Total square feet of basement area
地下室总面积

column = 'TotalBsmtSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
721
最小值和最大值: 0 6110

在这里插入图片描述

异常值: >5000

39.Heating:

Type of heating加热方式

   Floor	Floor Furnace
   GasA	Gas forced warm air furnace
   GasW	Gas hot water or steam heat
   Grav	Gravity furnace	
   OthW	Hot water or steam heat other than gas
   Wall	Wall furnace
column = 'Heating'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
GasA     1428
GasW       18
Grav        7
Wall        4
OthW        2
Floor       1
Name: Heating, dtype: int64

在这里插入图片描述

40.HeatingQC:

Heating quality and condition加热质量和条件

   Ex	Excellent
   Gd	Good
   TA	Average/Typical
   Fa	Fair
   Po	Poor
column = 'HeatingQC'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
Ex    741
TA    428
Gd    241
Fa     49
Po      1
Name: HeatingQC, dtype: int64

在这里插入图片描述

41.CentralAir:

Central air conditioning中央空调

   N	No
   Y	Yes
column = 'CentralAir'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
2
Y    1365
N      95
Name: CentralAir, dtype: int64

在这里插入图片描述

42.Electrical:

Electrical system电气系统

   SBrkr	Standard Circuit Breakers & Romex
   FuseA	Fuse Box over 60 AMP and all Romex wiring (Average)	
   FuseF	60 AMP Fuse Box and mostly Romex wiring (Fair)
   FuseP	60 AMP Fuse Box and mostly knob & tube wiring (poor)
   Mix	Mixed
column = 'Electrical'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
SBrkr    1334
FuseA      94
FuseF      27
FuseP       3
Mix         1
Name: Electrical, dtype: int64

在这里插入图片描述

43.1stFlrSF:

First Floor square feet一楼平方英尺

column = '1stFlrSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
753
最小值和最大值: 334 4692

在这里插入图片描述
可能的异常值: >4000

44.2ndFlrSF:

Second floor square feet二楼平方英尺

column = '2ndFlrSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
417
最小值和最大值: 0 2065

在这里插入图片描述

45.LowQualFinSF:

Low quality finished square feet (all floors)低质量完工面积(所有楼层)

column = 'LowQualFinSF'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
24
0      1434
80        3
360       2
528       1
53        1
120       1
144       1
156       1
205       1
232       1
234       1
371       1
572       1
390       1
392       1
397       1
420       1
473       1
479       1
481       1
513       1
514       1
515       1
384       1
Name: LowQualFinSF, dtype: int64

在这里插入图片描述

column = 'LowQualFinSF'
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
最小值和最大值: 0 572

在这里插入图片描述

46.GrLivArea:

Above grade (ground) living area square feet以上(地面)居住面积平方英尺

column = 'GrLivArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
861
最小值和最大值: 334 5642

在这里插入图片描述

47.BsmtFullBath:

Basement full bathrooms地下室全浴室

column = 'BsmtFullBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
0    856
1    588
2     15
3      1
Name: BsmtFullBath, dtype: int64

在这里插入图片描述

48.BsmtHalfBath:

Basement half bathrooms半地下室卫生间

column = 'BsmtHalfBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
0    1378
1      80
2       2
Name: BsmtHalfBath, dtype: int64

在这里插入图片描述

49.FullBath:

Full bathrooms above grade

column = 'FullBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
2    768
1    650
3     33
0      9
Name: FullBath, dtype: int64

在这里插入图片描述

50.HalfBath:

Half baths above grade

column = 'HalfBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
0    913
1    535
2     12
Name: HalfBath, dtype: int64

在这里插入图片描述

51.BedroomAbvGr:

Bedrooms above grade (does NOT include basement bedrooms)
楼上卧室(不包括地下室卧室)

column = 'BedroomAbvGr'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
3    804
2    358
4    213
1     50
5     21
6      7
0      6
8      1
Name: BedroomAbvGr, dtype: int64

在这里插入图片描述

52.KitchenAbvGr:

Kitchens above grade

column = 'KitchenAbvGr'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
1    1392
2      65
3       2
0       1
Name: KitchenAbvGr, dtype: int64

在这里插入图片描述

53.KitchenQual:

Kitchen quality

   Ex	Excellent
   Gd	Good
   TA	Typical/Average
   Fa	Fair
   Po	Poor
column = 'KitchenQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
TA    735
Gd    586
Ex    100
Fa     39
Name: KitchenQual, dtype: int64

在这里插入图片描述

54.TotRmsAbvGrd:

Total rooms above grade (does not include bathrooms)
以上楼层客房总数(不含浴室)

column = 'TotRmsAbvGrd'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
12
6     402
7     329
5     275
8     187
4      97
9      75
10     47
11     18
3      17
12     11
14      1
2       1
Name: TotRmsAbvGrd, dtype: int64

在这里插入图片描述

55.Functional:

Home functionality (Assume typical unless deductions are warranted)家庭功能(假设是典型的,除非有必要进行扣减)

   Typ	Typical Functionality
   Min1	Minor Deductions 1 小扣除1
   Min2	Minor Deductions 2
   Mod	Moderate Deductions温和的扣除
   Maj1	Major Deductions 1
   Maj2	Major Deductions 2
   Sev	Severely Damaged严重受损
   Sal	Salvage only
column = 'Functional'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Typ     1360
Min2      34
Min1      31
Mod       15
Maj1      14
Maj2       5
Sev        1
Name: Functional, dtype: int64

在这里插入图片描述

56.Fireplaces:

Number of fireplaces壁炉的数目

column = 'Fireplaces'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
0    690
1    650
2    115
3      5
Name: Fireplaces, dtype: int64

在这里插入图片描述

57.FireplaceQu:

Fireplace quality壁炉质量

   Ex	Excellent - Exceptional Masonry Fireplace
   Gd	Good - Masonry Fireplace in main level
   TA	Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
   Fa	Fair - Prefabricated Fireplace in basement
   Po	Poor - Ben Franklin Stove
   NA	No Fireplace
column = 'FireplaceQu'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
Gd    380
TA    313
Fa     33
Ex     24
Po     20
Name: FireplaceQu, dtype: int64

在这里插入图片描述

58.GarageType:

Garage location车库位置

   2Types	More than one type of garage
   Attchd	Attached to home附加到家里
   Basment	Basement Garage地下室车库
   BuiltIn	Built-In (Garage part of house - typically has room above garage)
   CarPort	Car Port
   Detchd	Detached from home
   NA	No Garage
column = 'GarageType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Attchd     870
Detchd     387
BuiltIn     88
Basment     19
CarPort      9
2Types       6
Name: GarageType, dtype: int64

在这里插入图片描述

59.GarageYrBlt:

Year garage was built

column = 'GarageYrBlt'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
98
2005.0    65
2006.0    59
2004.0    53
2003.0    50
2007.0    49
1977.0    35
1998.0    31
1999.0    30
2008.0    29
1976.0    29
2000.0    27
2002.0    26
1968.0    26
1950.0    24
1993.0    22
2009.0    21
1965.0    21
1966.0    21
1962.0    21
1958.0    21
2001.0    20
1996.0    20
1957.0    20
1970.0    20
1960.0    19
1997.0    19
1978.0    19
1954.0    19
1974.0    18
1994.0    18
1995.0    18
1964.0    18
1959.0    17
1963.0    16
1990.0    16
1956.0    16
1969.0    15
1979.0    15
1980.0    15
1967.0    15
1988.0    14
1973.0    14
1940.0    14
1920.0    14
1972.0    14
1961.0    13
1971.0    13
1955.0    13
1992.0    13
1953.0    12
1987.0    11
1948.0    11
1985.0    10
1981.0    10
1941.0    10
1925.0    10
1989.0    10
1975.0     9
1991.0     9
1939.0     9
1984.0     8
1949.0     8
1930.0     8
1983.0     7
1986.0     6
1951.0     6
1926.0     6
1922.0     5
1936.0     5
1916.0     5
1931.0     4
1945.0     4
1935.0     4
1928.0     4
1946.0     4
1982.0     4
1938.0     3
1921.0     3
1924.0     3
1910.0     3
1952.0     3
1932.0     3
2010.0     3
1923.0     3
1937.0     2
1934.0     2
1918.0     2
1947.0     2
1929.0     2
1914.0     2
1915.0     2
1942.0     2
1908.0     1
1927.0     1
1933.0     1
1900.0     1
1906.0     1
Name: GarageYrBlt, dtype: int64

在这里插入图片描述

print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
最小值和最大值: 1900.0 2010.0

在这里插入图片描述

60.GarageFinish:

Interior finish of the garage车库的内部装修

   Fin	Finished
   RFn	Rough Finished	
   Unf	Unfinished
   NA	No Garage
column = 'GarageFinish'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Unf    605
RFn    422
Fin    352
Name: GarageFinish, dtype: int64

在这里插入图片描述

61.GarageCars:

Size of garage in car capacity车库的容量

column = 'GarageCars'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
2    824
1    369
3    181
0     81
4      5
Name: GarageCars, dtype: int64

在这里插入图片描述

62.GarageArea:

Size of garage in square feet车库面积(平方英尺)

column = 'GarageArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
441
最小值和最大值: 0 1418

在这里插入图片描述

63.GarageQual:

Garage quality车库质量

   Ex	Excellent
   Gd	Good
   TA	Typical/Average
   Fa	Fair
   Po	Poor
   NA	No Garage
column = 'GarageQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
TA    1311
Fa      48
Gd      14
Po       3
Ex       3
Name: GarageQual, dtype: int64

在这里插入图片描述

64.GarageCond:

Garage condition车库条件

   Ex	Excellent
   Gd	Good
   TA	Typical/Average
   Fa	Fair
   Po	Poor
   NA	No Garage
column = 'GarageCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
TA    1326
Fa      35
Gd       9
Po       7
Ex       2
Name: GarageCond, dtype: int64

在这里插入图片描述

65.PavedDrive:

Paved driveway 铺设车道

   Y	Paved 
   P	Partial Pavement
   N	Dirt/Gravel
column = 'PavedDrive'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
Y    1340
N      90
P      30
Name: PavedDrive, dtype: int64

在这里插入图片描述

66.WoodDeckSF:

Wood deck area in square feet
木甲板面积,平方英尺

column = 'WoodDeckSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
274
最小值和最大值: 0 857

在这里插入图片描述

67.OpenPorchSF:

Open porch area in square feet
开放式门廊面积,平方英尺

column = 'OpenPorchSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
202
最小值和最大值: 0 547

在这里插入图片描述

68.EnclosedPorch:

Enclosed porch area in square feet
封闭门廊面积,平方英尺

column = 'EnclosedPorch'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
120
最小值和最大值: 0 552

在这里插入图片描述

69.3SsnPorch:

Three season porch area in square feet
三季门廊面积,平方英尺

column = '3SsnPorch'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
20
最小值和最大值: 0 508

在这里插入图片描述

70.ScreenPorch:

Screen porch area in square feet
屏风门廊面积,平方英尺

column = 'ScreenPorch'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
76
最小值和最大值: 0 480

在这里插入图片描述

71.PoolArea:

Pool area in square feet
游泳池面积,单位为平方英尺

column = 'PoolArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
8
最小值和最大值: 0 738

在这里插入图片描述

72.PoolQC:

Pool quality池质量

   Ex	Excellent
   Gd	Good
   TA	Average/Typical
   Fa	Fair
   NA	No Pool
column = 'PoolQC'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Gd    3
Fa    2
Ex    2
Name: PoolQC, dtype: int64

在这里插入图片描述

73.Fence:

Fence quality栅栏质量

   GdPrv	Good Privacy良好的隐私
   MnPrv	Minimum Privacy
   GdWo	Good Wood
   MnWw	Minimum Wood/Wire
   NA	No Fence
column = 'Fence'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
MnPrv    157
GdPrv     59
GdWo      54
MnWw      11
Name: Fence, dtype: int64

在这里插入图片描述

74.MiscFeature:

Miscellaneous feature not covered in other categories其他类别未包括的杂项特性

   Elev	Elevator电梯
   Gar2	2nd Garage (if not described in garage section)第二车库(如果在车库部分没有描述)
   Othr	Other
   Shed	Shed (over 100 SF)小屋(100平方英尺以上)
   TenC	Tennis Court网球场
   NA	None
column = 'MiscFeature'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
Shed    49
Othr     2
Gar2     2
TenC     1
Name: MiscFeature, dtype: int64

在这里插入图片描述

75.MiscVal:

$Value of miscellaneous feature $杂项功能的价值

column = 'MiscVal'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
21
0        1408
400        11
500         8
700         5
450         4
2000        4
600         4
1200        2
480         2
1150        1
800         1
15500       1
620         1
3500        1
560         1
2500        1
1300        1
1400        1
350         1
8300        1
54          1
Name: MiscVal, dtype: int64

在这里插入图片描述

print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
最小值和最大值: 0 15500

在这里插入图片描述

76.MoSold:

Month Sold (MM)售出月份(MM)

column = 'MoSold'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
12
6     253
7     234
5     204
4     141
8     122
3     106
10     89
11     79
9      63
12     59
1      58
2      52
Name: MoSold, dtype: int64

在这里插入图片描述

77.YrSold:

Year Sold (YYYY)售出年(年)

column = 'YrSold'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
2009    338
2007    329
2006    314
2008    304
2010    175
Name: YrSold, dtype: int64

在这里插入图片描述

78.SaleType:

Type of sale销售类型

   WD 	Warranty Deed - Conventional契约-契约
   CWD	Warranty Deed - Cash担保契约-现金
   VWD	Warranty Deed - VA Loan担保契约- VA贷款
   New	Home just constructed and sold房子刚建好就卖了
   COD	Court Officer Deed/Estate法院官员行为/房地产
   Con	Contract 15% Down payment regular terms合同首付款15%,定期条款
   ConLw	Contract Low Down payment and low interest低首付,低利息
   ConLI	Contract Low Interest合同低利率
   ConLD	Contract Low Down合同低
   Oth	Other
column = 'SaleType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
9
WD       1267
New       122
COD        43
ConLD       9
ConLI       5
ConLw       5
CWD         4
Oth         3
Con         2
Name: SaleType, dtype: int64

在这里插入图片描述

79. SaleCondition:

Condition of sale销售条件

   Normal	Normal Sale正常的销售
   Abnorml	Abnormal Sale -  trade, foreclosure, short sale非正常销售交易,丧失抵押品赎回权,卖空
   AdjLand	Adjoining Land Purchase毗邻的土地购买
   Alloca	Allocation - two linked properties with separate deeds, typically condo with a garage unit	房产分配——两个相连的房产,有各自的契约,通常是带车库的公寓
   Family	Sale between family members家庭成员间买卖
   Partial	Home was not completed when last assessed (associated with New Homes)房屋在最后一次评估时未完成(与新房屋相关)
column = 'SaleCondition'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
Normal     1198
Partial     125
Abnorml     101
Family       20
Alloca       12
AdjLand       4
Name: SaleCondition, dtype: int64

在这里插入图片描述

80. SalePrice
column = 'SalePrice'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
663
最小值和最大值: 34900 755000

在这里插入图片描述

标签:non,预测,column,kaggle,住房,train,print,null,data
来源: https://blog.csdn.net/weixin_45004761/article/details/114840701