关于近端梯度下降的理解
作者:互联网
近端梯度下降
目标函数
min
x
Q
(
x
)
=
f
(
x
)
+
g
(
x
)
(1)
\textit{min}_{x}Q(x)=f(x)+g(x) \tag {1}
minxQ(x)=f(x)+g(x)(1)
f
(
x
)
f(x)
f(x)为可微凸函数,
g
(
x
)
g(x)
g(x)非光滑,
x
∈
H
x\in{H}
x∈H。
考虑 f ( x ) f(x) f(x)
梯度下降
x
k
+
1
=
x
k
−
α
∇
f
(
x
)
(2)
x_{k+1}=x_{k}-\alpha\nabla{f(x)} \tag {2}
xk+1=xk−α∇f(x)(2)
f
(
x
)
f(x)
f(x)在
x
k
x_{k}
xk处泰勒展开
f
(
x
k
+
1
)
≈
f
(
x
k
)
+
∇
f
(
x
k
)
(
x
k
+
1
−
x
k
)
(3)
f(x_{k+1}) \approx f(x_{k})+\nabla{f(x_{k})}(x_{k+1}-x_{k}) \tag {3}
f(xk+1)≈f(xk)+∇f(xk)(xk+1−xk)(3)
式(2)+(3)
f
(
x
k
+
1
)
=
f
(
x
k
)
−
α
(
∇
f
(
x
k
)
)
2
f
(
x
k
+
1
)
<
f
(
x
k
)
(4)
\begin{aligned} f(x_{k+1})=f(x_{k})-\alpha(\nabla{f(x_{k})})^{2} \\ f(x_{k+1})<f(x_{k}) \tag {4} \end{aligned}
f(xk+1)=f(xk)−α(∇f(xk))2f(xk+1)<f(xk)(4)
f
(
x
)
f(x)
f(x)二阶泰勒展开
f
^
(
x
;
x
k
)
≈
f
(
x
k
)
+
∇
f
(
x
k
)
(
x
−
x
k
)
+
∇
2
f
(
x
k
)
2
(
x
−
x
k
)
2
(5)
\widehat{f}(x;x_{k}) \approx f(x_{k})+\nabla{f(x_{k})} (x-x_{k})+ \frac{\nabla^2 {f(x_{k})}}{2} (x-x_{k})^2 \tag {5}
f
(x;xk)≈f(xk)+∇f(xk)(x−xk)+2∇2f(xk)(x−xk)2(5)
∇
f
(
x
k
)
\nabla f(x_{k})
∇f(xk)满足L-Lipschitz条件
∣
∇
f
(
x
)
−
∇
f
(
x
k
)
∣
≤
L
∣
x
−
x
k
∣
∇
2
f
(
x
k
)
=
∣
∇
f
(
x
)
−
∇
f
(
x
k
)
∣
∣
x
−
x
k
∣
≤
L
(6)
\begin{aligned} \left\vert \nabla f(x) - \nabla f(x_{k}) \right\vert \leq L\left\vert x - x_{k} \right\vert \\ \nabla^2 {f(x_{k})} = \frac{\left\vert \nabla f(x) - \nabla f(x_{k}) \right\vert}{\left\vert x - x_{k} \right\vert} \leq L \tag {6} \end{aligned}
∣∇f(x)−∇f(xk)∣≤L∣x−xk∣∇2f(xk)=∣x−xk∣∣∇f(x)−∇f(xk)∣≤L(6)
式(5)+(6)
f
^
(
x
;
x
k
)
≈
f
(
x
k
)
+
∇
f
(
x
k
)
(
x
−
x
k
)
+
L
2
(
x
−
x
k
)
2
(7)
\widehat{f}(x;x_{k}) \approx f(x_{k})+\nabla{f(x_{k})} (x-x_{k})+ \frac{L}{2} (x-x_{k})^2 \tag {7}
f
(x;xk)≈f(xk)+∇f(xk)(x−xk)+2L(x−xk)2(7)
式(7)配方
f
^
(
x
;
x
k
)
≈
L
2
(
(
x
−
x
k
)
2
+
2
L
∇
f
(
x
k
)
(
x
−
x
k
)
+
(
1
L
∇
f
(
x
k
)
)
2
)
−
1
2
L
(
∇
f
(
x
k
)
)
2
+
f
(
x
k
)
f
^
(
x
;
x
k
)
≈
L
2
(
x
−
x
k
+
1
L
∇
f
(
x
k
)
)
2
−
1
2
L
(
∇
f
(
x
k
)
)
2
+
f
(
x
k
)
(8)
\begin{aligned} \widehat{f}(x;x_{k}) \approx \frac{L}{2} ((x-x_{k})^2 + \frac{2}{L} \nabla{f(x_{k})} (x-x_{k}) + (\frac{1}{L} \nabla{f(x_{k})})^2) - \frac{1}{2L} (\nabla{f(x_{k})} )^2 +f(x_{k}) \\ \widehat{f}(x;x_{k}) \approx \frac{L}{2} (x-x_{k}+\frac{1}{L} \nabla{f(x_{k})}) ^2- \frac{1}{2L} (\nabla{f(x_{k})} )^2 +f(x_{k}) \tag {8} \end{aligned}
f
(x;xk)≈2L((x−xk)2+L2∇f(xk)(x−xk)+(L1∇f(xk))2)−2L1(∇f(xk))2+f(xk)f
(x;xk)≈2L(x−xk+L1∇f(xk))2−2L1(∇f(xk))2+f(xk)(8)
当
x
k
+
1
=
a
r
g
m
i
n
x
f
^
(
x
;
x
k
)
x_{k+1}=arg min_{x} \widehat{f}(x;x_{k})
xk+1=argminxf
(x;xk)时
x
k
+
1
=
x
k
+
1
L
∇
f
(
x
k
)
(9)
x_{k+1}=x_{k} +\frac{1}{L} \nabla{f(x_{k})} \tag {9}
xk+1=xk+L1∇f(xk)(9)
考虑
g
(
x
)
g(x)
g(x)非光滑
式(1)+(8)
Q
(
x
)
=
L
2
(
x
−
x
k
+
1
L
∇
f
(
x
k
)
)
2
+
h
(
x
k
)
+
g
(
x
)
h
(
x
k
)
=
f
(
x
k
)
−
1
2
L
(
∇
f
(
x
k
)
)
2
x
k
+
1
=
a
r
g
m
i
n
x
L
2
(
x
−
x
k
+
1
L
∇
f
(
x
k
)
)
2
+
g
(
x
)
(10)
\begin{aligned} Q(x)=\frac{L}{2} (x-x_{k}+\frac{1}{L} \nabla{f(x_{k})}) ^2+h(x_{k})+g(x) \\ h(x_{k})=f(x_{k})-\frac{1}{2L} (\nabla{f(x_{k})} )^2 \\ x_{k+1}=arg min_{x} \frac{L}{2} (x-x_{k}+\frac{1}{L} \nabla{f(x_{k})}) ^2+g(x) \tag {10} \end{aligned}
Q(x)=2L(x−xk+L1∇f(xk))2+h(xk)+g(x)h(xk)=f(xk)−2L1(∇f(xk))2xk+1=argminx2L(x−xk+L1∇f(xk))2+g(x)(10)
近端投影
p
r
o
x
g
(
x
)
(
z
)
=
a
r
g
m
i
n
x
1
2
(
x
−
z
)
2
+
g
(
x
)
(11)
prox_{g(x)}(z) = arg min_{x} \frac{1}{2} (x-z) ^2+g(x) \tag {11}
proxg(x)(z)=argminx21(x−z)2+g(x)(11)
注:上述内容仅为个人学习过程中的笔记,如有不当的地方还望指正
标签:xk,frac,vert,梯度,nabla,tag,理解,2L,近端 来源: https://blog.csdn.net/qq_24855055/article/details/111054893