其他分享
首页 > 其他分享> > ML: Anomaly Detection | Multivariate Gaussian Distribution

ML: Anomaly Detection | Multivariate Gaussian Distribution

作者:互联网

Source: Coursera Machine Learning provided by Stanford University Andrew Ng - Machine Learning | Coursera


Anomaly Detection

assumption:

Each feature follows Gaussian distribution:

$$ x_j \sim N(\mu_j, \sigma_j^2) $$

And they are independent, i.e. for each example $x$:

$$ p(x) = \prod_{j=1}^{n} p(x_j;\mu_j,\sigma_j^2) $$

feature selection:

Choose features that take unusually large or small values if it is an anomaly.

Create features that grasp the information of special correlations between features, e.g. ratio for linear correlation.

For non-gaussian features, use different degrees (e.g. $\sqrt{x}$, $\sqrt[3]{x}$) or take logarithm ($log(x)$).

algorithm process:

1. compute the means and variances for each feature:

$$ \mu_j = \frac{1}{m} \sum_{i=1}^m x_j^{(i)} $$

$$ \sigma_j^2 = \frac{1}{m} \sum_{i=1}^m (x_j^{(i)} - \mu_j)^2 $$

2. compute $p(x)$ for a new example $x$:

$$ p(x) = \prod_{j=1}^{n} p(x_j;\mu_j,\sigma_j^2) = \prod_{j=1}^{n} \frac{1}{\sqrt{2\pi}\sigma_j}exp(-\frac{(x_j - \mu_j)^2}{2\sigma_j^2}) $$

3. check if it is an anomaly by:

$$ y = \left\{\begin{matrix}1\ \ \ if\ p(x)<\epsilon \\0\ \ \ if\ p(x)\geq\epsilon\end{matrix}\right. $$

alternative assumption - multivariate Gaussian distribution:

1. compute $\mu \in \mathbb{R}^n$ and $\Sigma \in \mathbb{R}^{n \times n}$:

$$ \Sigma = \frac{1}{m} \sum_{i=1}^{m}(x^{(i)} - \mu)(x^{(i)} - \mu)^{T} $$

2. compute $p(x)$ for a new example $x$:

$$ p(x) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}} exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)) $$

examples:

12

34

5

The original model corresponds to the multivariate Gaussian model when $\Sigma$ is a diagonal matrix:

$$ \Sigma = \begin{bmatrix}\sigma_1^2 &  &  &  \\ & \sigma_2^2 &  &  \\ &  & \ddots &  \\ &  &  & \sigma_n^2 \\\end{bmatrix} $$

comparison between the two models:

original model multivariate Gaussian model
need to manually create features that grasp the correlations between features automatically captures those correlations
computationally cheaper computationally expensive
works fine even if $m$ is small must have $m>n$, otherwise $\Sigma$ is non-invertible

evaluating an anomaly detection system:

training set: 60% of the negative ($y=0$) examples

cross validation set: 20% of the negative examples, 50% of the positive examples

test set: 20% of the negative examples, 50% of the positive examples

evaluation method: error & $F_1$ score (because the data for anomaly detection are often skewed)

anomaly detection vs. supervised learning:

anomaly detection supervised learning
large number of negative examples, very small number of positive examples large number of both negative and positive examples
hard for the algorithm to learn all types of anomalies enough positive examples for the algorithm to learn most (if not all) types of anomalies

 

标签:Multivariate,frac,features,Sigma,ML,Gaussian,mu,examples,sigma
来源: https://www.cnblogs.com/ms-qwq/p/16484706.html