其他分享
首页 > 其他分享> > ML之FE:利用FE特征工程(分析两两数值型特征之间的相关性)对AllstateClaimsSeverity(Kaggle2016竞赛)数据集实现索赔成本值的回归预测

ML之FE:利用FE特征工程(分析两两数值型特征之间的相关性)对AllstateClaimsSeverity(Kaggle2016竞赛)数据集实现索赔成本值的回归预测

作者:互联网

ML之FE:利用FE特征工程(分析两两数值型特征之间的相关性)对AllstateClaimsSeverity(Kaggle2016竞赛)数据集实现索赔成本值的回归预测

 

 

目录

输出结果

设计思路

核心代码


 

 

 

 

输出结果

1、数据集简介

Dataset之AllstateClaimsSeverity:AllstateClaimsSeverity数据集(Kaggle2016竞赛)的简介、下载、案例应用之详细攻略

 

 

2、数据可视化

T1、绘制heatmap图

T2、绘制散点图

 

 

设计思路

 

核心代码

threshold = 0.5     
corr_list = []      
for i in range(0,size):                                   
    for j in range(i+1,size):                             
        if (data_corr.iloc[i,j] >= threshold and data_corr.iloc[i,j] < 1) or (data_corr.iloc[i,j] < 0 and data_corr.iloc[i,j] <= -threshold):  
            corr_list.append([data_corr.iloc[i,j],i,j])   
s_corr_list = sorted(corr_list,key=lambda x: -abs(x[0]))   
for v,i,j in s_corr_list:                                  
    print ("%s and %s = %.2f" % (cols[i],cols[j],v))


for v,i,j in s_corr_list:
    sns.pairplot(train, size=6, x_vars=cols[i],y_vars=cols[j] )
    plt.title('AllstateClaimsSeverity: Scatter plot of only the highly correlated pairs')
    plt.show()

 

 

 

标签:Kaggle2016,ML,AllstateClaimsSeverity,FE,corr,iloc,data
来源: https://blog.51cto.com/u_14217737/2905683