ML: Anomaly Detection | Multivariate Gaussian Distribution
作者:互联网
Source: Coursera Machine Learning provided by Stanford University Andrew Ng - Machine Learning | Coursera
Anomaly Detection
assumption:
Each feature follows Gaussian distribution:
$$ x_j \sim N(\mu_j, \sigma_j^2) $$
And they are independent, i.e. for each example $x$:
$$ p(x) = \prod_{j=1}^{n} p(x_j;\mu_j,\sigma_j^2) $$
feature selection:
Choose features that take unusually large or small values if it is an anomaly.
Create features that grasp the information of special correlations between features, e.g. ratio for linear correlation.
For non-gaussian features, use different degrees (e.g. $\sqrt{x}$, $\sqrt[3]{x}$) or take logarithm ($log(x)$).
algorithm process:
1. compute the means and variances for each feature:
$$ \mu_j = \frac{1}{m} \sum_{i=1}^m x_j^{(i)} $$
$$ \sigma_j^2 = \frac{1}{m} \sum_{i=1}^m (x_j^{(i)} - \mu_j)^2 $$
2. compute $p(x)$ for a new example $x$:
$$ p(x) = \prod_{j=1}^{n} p(x_j;\mu_j,\sigma_j^2) = \prod_{j=1}^{n} \frac{1}{\sqrt{2\pi}\sigma_j}exp(-\frac{(x_j - \mu_j)^2}{2\sigma_j^2}) $$
3. check if it is an anomaly by:
$$ y = \left\{\begin{matrix}1\ \ \ if\ p(x)<\epsilon \\0\ \ \ if\ p(x)\geq\epsilon\end{matrix}\right. $$
alternative assumption - multivariate Gaussian distribution:
1. compute $\mu \in \mathbb{R}^n$ and $\Sigma \in \mathbb{R}^{n \times n}$:
$$ \Sigma = \frac{1}{m} \sum_{i=1}^{m}(x^{(i)} - \mu)(x^{(i)} - \mu)^{T} $$
2. compute $p(x)$ for a new example $x$:
$$ p(x) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}} exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)) $$
examples:
The original model corresponds to the multivariate Gaussian model when $\Sigma$ is a diagonal matrix:
$$ \Sigma = \begin{bmatrix}\sigma_1^2 & & & \\ & \sigma_2^2 & & \\ & & \ddots & \\ & & & \sigma_n^2 \\\end{bmatrix} $$
comparison between the two models:
evaluating an anomaly detection system:
training set: 60% of the negative ($y=0$) examples
cross validation set: 20% of the negative examples, 50% of the positive examples
test set: 20% of the negative examples, 50% of the positive examples
evaluation method: error & $F_1$ score (because the data for anomaly detection are often skewed)
anomaly detection vs. supervised learning:
标签:Multivariate,frac,features,Sigma,ML,Gaussian,mu,examples,sigma 来源: https://www.cnblogs.com/ms-qwq/p/16484706.html