首页 > 其他分享> > 7.逻辑回归实践

7.逻辑回归实践

2020-04-28 11:05:14 作者：互联网

1.逻辑回归是怎么防止过拟合的？为什么正则化可以防止过拟合？（大家用自己的话介绍下）

防止过拟合:

L1正则，通过增大正则项导致更多参数为0，参数系数化降低模型复杂度，从而抵抗过拟合。

L2正则，通过使得参数都趋于0，变得很小，降低模型的抖动，从而抵抗过拟合。

加大样本量。

通过特征选择减少特征量。

EDA-探索有区分性的特征。

特征派生-不断派生更多强组合的特征。

正则化是结构风险最小化的一种策略实现。

通过降低模型复杂度，得到更小的泛化误差，降低过拟合程度

2.用logiftic回归来进行实践操作，数据不限。

源代码：

from sklearn.linear_model import LogisticRegression ##回归API
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report

import numpy as np
import pandas as pd

def logistic():
    '''逻辑回归进行肿瘤的预测'''
    column = [
        '数据编号','属性1','属性2','属性3',
        '属性4','属性5','属性6','属性7',
        '属性8','属性9','属性10'
    ]
    #读取数据
    cancer = pd.read_csv('D:\\Python1\\dasan - new\\机器学习\\breast-cancer-wisconsin.csv',names=column)
    #缺失值处理
    cancer = cancer.replace(to_replace='?', value=np.nan)
    cancer = cancer.dropna()

    #数据分析
    x_train, x_test, y_train, y_test = train_test_split(
        cancer[column[1:10]], cancer[column[10]], test_size=0.3
    )

    #进行标准化处理
    std = StandardScaler()
    x_train = std.fit_transform(x_train)
    x_test = std.transform(x_test)

    #逻辑回归预测
    lg = LogisticRegression()
    lg.fit(x_train, y_train)
    print(lg.coef_)
    lg_predict = lg.predict(x_test)
    print("准确率：", lg.score(x_test, y_test))
    print("召回率：", classification_report(y_test, lg_predict, labels=[2,4], target_names=['良性', '恶性']))

if __name__ == '__main__':
    logistic()

标签：lg,逻辑,cancer,回归,实践,train,test,import,属性
来源： https://www.cnblogs.com/fzwboke/p/12787534.html