regularization

2021-08-03 15:02:08 作者：互联网

正则化

概念：

在某种应用场景之下，可能一个结果会有许多影响参数，比如综合评价一个学生的标准，就有一堆的条件，此时会出现过拟合和欠拟合两种情况。

欠拟合：程序对某一个参数偏置严重，甚至只以这个参数为标准。

过拟合：对每一个参数都进行贴合，即使有些参数它其实是无关紧要的（看起来很重要，实际参与运算很小的），这样就会导致函数的复杂度大大提高。

而解决欠拟合和过拟合的方式就是正则化。

我们通过引入惩罚项来进行正则化，这是原损失函数
c o s t = 1 2 m ∑ ( h θ ( x i ) − y i ) 2 cost = \frac{1}{2m}\sum(h_\theta(x_i) - y_i)^2 cost=2m1∑(hθ(xi)−yi)2
现在我们引入正则项
c o s t = 1 2 m ∑ ( h θ ( x i ) − y i ) 2 + λ ∑ θ j 2 cost = \frac{1}{2m}\sum(h_\theta(x_i) - y_i)^2+\lambda\sum\theta_j^2 cost=2m1∑(hθ(xi)−yi)2+λ∑θj2
正则项的目的是缩小所有特征项，这里选择保留所有的特征值，通过控制 λ \lambda λ的大小来控制曲线尽量去拟合样本数据。

当 λ \lambda λ越大时，惩罚也就越大，因为要保证cost最小，所以 θ \theta θ也就越小。同时 θ \theta θ在前面依旧有权重。当我们在调整 θ \theta θ时，前面的部分计算的值会有偏差，后面的也会发生变化，通过 λ \lambda λ来调控这些值，最后依然会得到一个 θ \theta θ向量，我们可以把极小的舍去，达到化简。

所以当我们将正则化引入到线性回归时，每个 θ \theta θ的偏导如下：
J ( θ 0 ) = 1 M ∑ ( h θ ( x i ) − y i ) J ( θ j ) = 1 M ∑ ( h θ ( x i ) − y i ) x j i + λ θ j J(\theta_0) = \frac{1}{M} \sum(h_\theta(x^i) - y^i)\\ J(\theta_j) = \frac{1}{M} \sum(h_\theta(x^i)-y^i)x_j^i+\lambda\theta_j J(θ0)=M1∑(hθ(xi)−yi)J(θj)=M1∑(hθ(xi)−yi)xji+λθj
进行梯度下降时，即 θ = θ − α J ( θ ) \theta = \theta - \alpha J(\theta) θ=θ−αJ(θ)，有：
θ 0 = θ 0 − α 1 M ∑ ( h θ ( x i ) − y i ) θ j = θ j ( 1 − α λ M ) − α M ∑ ( h θ ( x i ) − y i ) x j i \theta_0 = \theta_0 - \alpha \frac{1}{M} \sum(h_\theta(x^i) - y^i) \\ \theta_j = \theta_j(1 - \alpha\frac{\lambda}{M}) - \frac{\alpha}{M} \sum(h_\theta(x^i)-y^i)x_j^i θ0=θ0−αM1∑(hθ(xi)−yi)θj=θj(1−αMλ)−Mα∑(hθ(xi)−yi)xji

标签：yi,regularization,frac,sum,拟合,theta,lambda
来源： https://blog.csdn.net/qq_46141221/article/details/119349892