其他分享
首页 > 其他分享> > 【集成学习(下)】Task14 幸福感预测

【集成学习(下)】Task14 幸福感预测

作者:互联网

文章目录

前言

一、目的和要求
理解k-近邻算法的原理,掌握k-近邻算法的应用开发。

二、主要内容
实例:糖尿病预测
任务:预测Pima 印度安人的糖尿病
数据来源:

  1. https://www.kaggle.com/uciml/pima-indians-diabetes-database
  2. 在实验1文件夹里,pima-indians-diabetes
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score,precision_score, \
recall_score,f1_score,cohen_kappa_score
from collections import Counter
from sklearn.metrics import roc_curve,auc
data = pd.read_csv('./diabetes.csv')

数据探索

data.head()
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331

数据说明

data.describe()
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
count768.000000768.000000768.000000768.000000768.000000768.000000768.000000768.000000768.000000
mean3.845052120.89453169.10546920.53645879.79947931.9925780.47187633.2408850.348958
std3.36957831.97261819.35580715.952218115.2440027.8841600.33132911.7602320.476951
min0.0000000.0000000.0000000.0000000.0000000.0000000.07800021.0000000.000000
25%1.00000099.00000062.0000000.0000000.00000027.3000000.24375024.0000000.000000
50%3.000000117.00000072.00000023.00000030.50000032.0000000.37250029.0000000.000000
75%6.000000140.25000080.00000032.000000127.25000036.6000000.62625041.0000001.000000
max17.000000199.000000122.00000099.000000846.00000067.1000002.42000081.0000001.000000
data.shape
(768, 9)
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
data['Outcome'].value_counts()
0    500
1    268
Name: Outcome, dtype: int64

小结

模型构建

数据标准化

new_data = data.drop([ 'Outcome'], axis=1)
scale = MinMaxScaler().fit(new_data)## 训练规则
biao_data = scale.transform(new_data) ## 应用规则

划分训练集和测试集

X_train,X_test,y_train,y_test = train_test_split(biao_data,data['Outcome'],test_size=0.2,random_state=123)
# 模型训练
k = 5
clf = KNeighborsClassifier(n_neighbors=k)
clf.fit(X_train, y_train)
KNeighborsClassifier()
y_pred = clf.predict(X_test)

模型评估

fpr,tpr,threshold = roc_curve(y_test, y_pred)
print('数据的AUC为:',auc(fpr,tpr))
print('数据的准确率为:',accuracy_score(y_test,y_pred))
print('数据的精确率为:',precision_score(y_test,y_pred))
print('数据的召回率为:',recall_score(y_test,y_pred))
print('数据的F1值为:',f1_score(y_test,y_pred))
print('数据的Cohen’s Kappa系数为:',cohen_kappa_score(y_test,y_pred))
print('Counter:',Counter(y_pred))
数据的AUC为: 0.7634698275862069
数据的准确率为: 0.7987012987012987
数据的精确率为: 0.8
数据的召回率为: 0.6206896551724138
数据的F1值为: 0.6990291262135923
数据的Cohen’s Kappa系数为: 0.5514001127607593
Counter: Counter({0: 109, 1: 45})

总结

自己太废了
在这里插入图片描述

参考

标签:集成,768,幸福感,non,score,test,null,data,Task14
来源: https://blog.csdn.net/jcjic/article/details/116989146