其他分享
首页 > 其他分享> > 决策树API、泰坦尼克号生存预测案例

决策树API、泰坦尼克号生存预测案例

作者:互联网

一、决策树API

在sklearn中使用sklearn.tree.DecisionTreeClassifier(criterion=’gini’, max_depth=None,random_state=None)构建决策树

其中:

二、泰坦尼克号生存预测案例

代码:

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer


# 数据获取
titan_data = pd.read_csv('./titan/train.csv')

# 数据预处理
x = titan_data[["Pclass", "Sex", "Age"]]
y = titan_data["Survived"]
# 填充空值
x['Age'].fillna(x['Age'].mean(),inplace=True)
# 数据分割
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)

# 将pclass和性别进行特征提取,也就是转换为数字
transfer = DictVectorizer(sparse=True)
x_train = transfer.fit_transform(x_train.to_dict(orient="records"))
x_test = transfer.fit_transform(x_test.to_dict(orient="records"))

# 机器学习
estimator = DecisionTreeClassifier(criterion="entropy",max_depth=13)
estimator.fit(x_train, y_train)

# 模型评估
estimator.score(x_test,y_test)

 

训练集数据来自kaggle平台: https://www.kaggle.com/c/titanic/overview

标签:API,泰坦尼克号,样本数,train,samples,样本量,test,决策树
来源: https://blog.csdn.net/qq_39197555/article/details/115331307