其他分享
首页 > 其他分享> > 【深度之眼】【百面机器学习】PCA降维

【深度之眼】【百面机器学习】PCA降维

作者:互联网

目录

知识点

sklearn.decomposition.PCA()

参数

除了这些输入参数外,有几个PCA类的成员值得关注。

PCA对象方法

fit()可以说是scikit-learn中通用的方法,每个需要训练的算法都会有fit()方法,它其实就是算法中的“训练”这一步骤。因为PCA是无监督学习算法,此处y自然等于None。

fit(X),表示用数据X来训练PCA模型。

函数返回值:调用fit方法的对象本身。比如pca.fit(X),表示用X对pca这个对象进行训练。

用X来训练PCA模型,同时返回降维后的数据。
newX=pca.fit_transform(X),newX就是降维后的数据。

将降维后的数据转换成原始数据,X=pca.inverse_transform(newX)

将数据X转换成降维后的数据。当模型训练好后,对于新输入的数据,都可以用transform方法来降维。

此外,还有get_covariance()、get_precision()、get_params(deep=True)、score(X, y=None)等方法,需要的话自行百度。

代码

相关的库包

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

# K近邻分类器
from sklearn.neighbors import KNeighborsClassifier
# PCA
from sklearn.decomposition import PCA

KNN

# k近邻
knn_clf = KNeighborsClassifier()
knn_clf.fit(x_train, y_train)
acc = knn_clf.score(x_test, y_test)
print("knn acc is %s: " % acc)

PCA

# PCA
pca = PCA(n_components=2)
pca.fit(x_train, y_train)
x_train_reduction = pca.transform(x_train)
x_test_reduction = pca.transform(x_test)

knn_clf = KNeighborsClassifier()
knn_clf.fit(x_train_reduction, y_train)
acc = knn_clf.score(x_test_reduction, y_test)
print("After PCA, knn acc is %s: " % acc)

显示代码

# 显示方差的改变曲线
plt.plot([i for i in range(x_train.shape[1])],
         [np.sum(pca.explained_variance_ratio_[:i + 1]) for i in range(x_train.shape[1])])
plt.show()

# 降维可视化
pca = PCA(n_components=2)
x_reduction = pca.fit_transform(x)

for i in range(10):
    plt.scatter(x_reduction[y == i, 0], x_reduction[y == i, 1], alpha=0.8)
plt.show()
    

完整的代码

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

# K近邻分类器
from sklearn.neighbors import KNeighborsClassifier
# PCA
from sklearn.decomposition import PCA

if __name__ == '__main__':
    digits = datasets.load_digits()
    x = digits.data
    y = digits.target

    x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=666)

    # k近邻
    knn_clf = KNeighborsClassifier()
    knn_clf.fit(x_train, y_train)
    acc = knn_clf.score(x_test, y_test)
    print("knn acc is %s: " % acc)

    # PCA
    pca = PCA(n_components=2)
    pca.fit(x_train, y_train)
    x_train_reduction = pca.transform(x_train)
    x_test_reduction = pca.transform(x_test)

    knn_clf = KNeighborsClassifier()
    knn_clf.fit(x_train_reduction, y_train)
    acc = knn_clf.score(x_test_reduction, y_test)
    print("After PCA, knn acc is %s: " % acc)

    # 显示方差的改变曲线
    plt.plot([i for i in range(x_train.shape[1])],
             [np.sum(pca.explained_variance_ratio_[:i + 1]) for i in range(x_train.shape[1])])
    plt.show()

    # 降维可视化
    pca = PCA(n_components=2)
    x_reduction = pca.fit_transform(x)

    for i in range(10):
        plt.scatter(x_reduction[y == i, 0], x_reduction[y == i, 1], alpha=0.8)
    plt.show()

标签:knn,pca,fit,降维,train,test,百面,PCA
来源: https://blog.csdn.net/huxw_magus/article/details/110705746