首页 > 其他分享> > ML: Anomaly Detection | Multivariate Gaussian Distribution

ML: Anomaly Detection | Multivariate Gaussian Distribution

2022-07-16 17:33:23 作者：互联网

Source: Coursera Machine Learning provided by Stanford University Andrew Ng - Machine Learning | Coursera

Anomaly Detection

assumption:

Each feature follows Gaussian distribution:

$$ x_j \sim N(\mu_j, \sigma_j^2) $$

And they are independent, i.e. for each example $x$:

$$ p(x) = \prod_{j=1}^{n} p(x_j;\mu_j,\sigma_j^2) $$

feature selection:

Choose features that take unusually large or small values if it is an anomaly.

Create features that grasp the information of special correlations between features, e.g. ratio for linear correlation.

For non-gaussian features, use different degrees (e.g. $\sqrt{x}$, $\sqrt[3]{x}$) or take logarithm ($log(x)$).

algorithm process:

1. compute the means and variances for each feature:

$$ \mu_j = \frac{1}{m} \sum_{i=1}^m x_j^{(i)} $$

$$ \sigma_j^2 = \frac{1}{m} \sum_{i=1}^m (x_j^{(i)} - \mu_j)^2 $$

2. compute $p(x)$ for a new example $x$:

$$ p(x) = \prod_{j=1}^{n} p(x_j;\mu_j,\sigma_j^2) = \prod_{j=1}^{n} \frac{1}{\sqrt{2\pi}\sigma_j}exp(-\frac{(x_j - \mu_j)^2}{2\sigma_j^2}) $$

3. check if it is an anomaly by:

$$ y = \left\{\begin{matrix}1\ \ \ if\ p(x)<\epsilon \\0\ \ \ if\ p(x)\geq\epsilon\end{matrix}\right. $$

alternative assumption - multivariate Gaussian distribution:

1. compute $\mu \in \mathbb{R}^n$ and $\Sigma \in \mathbb{R}^{n \times n}$:

$$ \Sigma = \frac{1}{m} \sum_{i=1}^{m}(x^{(i)} - \mu)(x^{(i)} - \mu)^{T} $$

2. compute $p(x)$ for a new example $x$:

$$ p(x) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}} exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)) $$

examples:

The original model corresponds to the multivariate Gaussian model when $\Sigma$ is a diagonal matrix:

$$ \Sigma = \begin{bmatrix}\sigma_1^2 & & & \\ & \sigma_2^2 & & \\ & & \ddots & \\ & & & \sigma_n^2 \\\end{bmatrix} $$

comparison between the two models:

original model	multivariate Gaussian model
need to manually create features that grasp the correlations between features	automatically captures those correlations
computationally cheaper	computationally expensive
works fine even if $m$ is small	must have $m>n$, otherwise $\Sigma$ is non-invertible

evaluating an anomaly detection system:

training set: 60% of the negative ($y=0$) examples

cross validation set: 20% of the negative examples, 50% of the positive examples

test set: 20% of the negative examples, 50% of the positive examples

evaluation method: error & $F_1$ score (because the data for anomaly detection are often skewed)

anomaly detection vs. supervised learning:

anomaly detection	supervised learning
large number of negative examples, very small number of positive examples	large number of both negative and positive examples
hard for the algorithm to learn all types of anomalies	enough positive examples for the algorithm to learn most (if not all) types of anomalies

标签：Multivariate,frac,features,Sigma,ML,Gaussian,mu,examples,sigma
来源： https://www.cnblogs.com/ms-qwq/p/16484706.html