ML: Dimensionality Reduction - Principal Component Analysis
作者:互联网
Source: Coursera Machine Learning provided by Stanford University Andrew Ng - Machine Learning | Coursera
Dimensionality Reduction - Principal Component Analysis (PCA)
notations:
$u_k$: the k-th principal component of variation
$z^{(i)}$: the projection of the $i$-th example $x^{(i)}$
$x_{approx}^{(i)}$: the recovered data of $x^{(i)}$ from its projection $z^{(i)}$
problem formulation:
For an $n$ dimensional input dataset, reduce it to $k$ dimension. That is, find $k$ vectors ($u_1, u_2, \cdots, u_k$) onto which to project the data, so as to minimize the projection error:
$$ error = \frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)} - x_{approx}^{(i)}\right\| ^ 2 $$
algorithm process:
1. feature scaling and mean normalization for the original dataset $x^{(i)} \in \mathbb{R}^n$
2. compute the covariance matrix $\Sigma \in \mathbb{R}^{n \times n}$:
$$ \Sigma = \frac{1}{m} \sum_{i=1}^{m} (x^{(i)})(x^{(i)})^{T} $$
Sigma = X' * X / m;
3. compute the eigenvectors of the covariance matrix using:
[U, S, V] = svc(Sigma);
$$ U = \begin{bmatrix}| & | & & | \\u_1 & u_2 & \vdots & u_n \\| & | & & | \\\end{bmatrix} $$
4. select the first $k$ columns of matrix $U \in \mathbb{R}^{n \times n}$ as the $k$ principal components:
U_reduce = U(:, 1:k);
5. project $x^{(i)}$ into a $k$ dimensional vector $z^{(i)}$:
$$ z^{(i)} = U_{reduce}^{T}x^{(i)} $$
Z = X * U_reduce;
6. reconstruction from compressed representation:
$$ x_{approx}^{(i)} = U_{reduce}z^{(i)} $$
X_approx = U_reduce * Z;
choosing the number of principal components:
The average squared projection error is:
$$ error = \frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)} - x_{approx}^{(i)}\right\| ^ 2 $$
The total variation of the dataset is:
$$ variation = \frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)}\right\|^2 $$
Typically, choose $k$ to be the smallest value so that:
$$ \frac{error}{variation} = \frac{\frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)} - x_{approx}^{(i)}\right\| ^ 2}{\frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)}\right\|^2} \leq 0.01 $$
i.e. 99% of variation is retained.
In practice, this value is found to be:
$$ \frac{error}{variation} = 1 - \frac{\sum_{i=1}^{k}S_{ii}}{\sum_{i=1}^{n}S_{ii}} $$
$$ S = \begin{bmatrix}S_{11} & & & \\ & S_{22} & & \\ & & \ddots & \\ & & & S_{nn} \\\end{bmatrix} $$
Hence, the algorithm only needs to be run once. And pick the smallest $k$ so that:
$$ \frac{\sum_{i=1}^{k}S_{ii}}{\sum_{i=1}^{n}S_{ii}} \geq 0.99 $$
usages of PCA:
- data compression: reduce the memory needed & speed up the learning algorithm
- data visualization: reduce the data to 2D or 3D so that they can be plotted
- improper use: use PCA for regularization, because some information is lost during the process of PCA
标签:reduce,Dimensionality,frac,ML,sum,Component,approx,variation,error 来源: https://www.cnblogs.com/ms-qwq/p/16484697.html