其他分享
首页 > 其他分享> > 最优化学习笔记1——关于拟牛顿法推导

最优化学习笔记1——关于拟牛顿法推导

作者:互联网

从2020年3月份到现在,一年的时间里断断续续自学最优化,结合豆瓣读书、知乎等网站上的推荐,翻阅了以下书和课程:

关于最优化的书和课程非常多,其他书和课程包括但不限于:

侧重于应用方面的有:

值得一提的是,最优化和控制学科里面的最优控制联系紧密,是最优控制中直接法的数学理论基石,除了上述Lorenz T. Biegler的著作涉及最优控制外,侧重于最优控制求解的还有:

个人觉得,上述这些著作全部看完不必要也不现实,应该先选择1至2本入门,然后根据自己的研究需要选读。

扯完前面的废话,谈谈拟牛顿法,主要参考《Numerical Optimization》中的内容,根据第3章的描述,拟牛顿法最开始主要用于克服牛顿法计算无约束优化问题中的搜索方向的不足(有无约束和拟牛顿的思想没有太大关系)。以线搜索为例(对于信赖域法类似),我们想要
min ⁡ x f ( x ) (1) \min_x f(x)\tag{1} xmin​f(x)(1)

的方式是通过下述迭代过程:
x k + 1 = x k + α k p k (2) x_{k+1}=x_k+\alpha _kp_k\tag{2} xk+1​=xk​+αk​pk​(2)

式中: α k \alpha _k αk​为步长, p k p_k pk​为搜索方向。

这里不考虑步长 α k \alpha _k αk​的选取,下面主要考虑搜索方向 p k p_k pk​的选取。首先看牛顿法,需要求 f ( x ) f(x) f(x)对应的Hessian矩阵 ∇ 2 f \nabla ^2f ∇2f,给出的搜索方向为
p k = − ∇ 2 f k − 1 ∇ f k (3) p_k=-\nabla ^2f_{k}^{-1}\nabla f_k\tag{3} pk​=−∇2fk−1​∇fk​(3)

式中: ∇ 2 f k − 1 \nabla ^2f_{k}^{-1} ∇2fk−1​为 ∇ 2 f k \nabla ^2f_k ∇2fk​的逆矩阵,考虑到 ∇ 2 f k − 1 \nabla ^2f_{k}^{-1} ∇2fk−1​并不容易计算,拟牛顿法提出用下式代替牛顿法的搜索方向:
p k = − B k − 1 ∇ f k (4) p_k=-B_{k}^{-1}\nabla f_k\tag{4} pk​=−Bk−1​∇fk​(4)

式中: B k B_k Bk​一般选为对称正定矩阵,若能够直接获得 B k B_k Bk​的计算方式,那么就免去了求Hessian矩阵的繁琐工作。那么如何得到 B k B_k Bk​呢?

1. 拟牛顿法中的割线方程推导

考虑到牛顿法其实可以从对目标函数的二阶近似获得,即考虑在当前解 x k x_k xk​附近对目标函数的二阶近似:
m k ( p ) = f k + ∇ f k T p + 1 2 p T ∇ 2 f k p (5) m_k(p)=f_k+\nabla f_{k}^{\mathrm{T}}p+\frac{1}{2}p^{\mathrm{T}}\nabla ^2f_kp\tag{5} mk​(p)=fk​+∇fkT​p+21​pT∇2fk​p(5)
式中: p = x − x k p=x-x_k p=x−xk​。选择 p p p使得式(5)中的目标函数取极值,就得到了牛顿法的公式,那么一个自然的想法是,式(4)可看作是使得下述目标函数取极值:
m k ( p ) = f k + ∇ f k T p + 1 2 p T B k p (6) m_k(p)=f_k+\nabla f_{k}^{\mathrm{T}}p+\frac{1}{2}p^{\mathrm{T}}B_kp\tag{6} mk​(p)=fk​+∇fkT​p+21​pTBk​p(6)

即用对称正定矩阵 B k B_k Bk​代替Hessian矩阵 ∇ 2 f k \nabla ^2f_k ∇2fk​。由于直接获得 B k B_k Bk​的表达式不太容易,转而考虑迭代计算 B k B_k Bk​,即需要推导 B k + 1 B_{k+1} Bk+1​和 B k B_k Bk​的关系,根据式(6),到了 x k + 1 x_{k+1} xk+1​附近时,有
m k + 1 ( p ) = f k + 1 + ∇ f k + 1 T p + 1 2 p T B k + 1 p (7) m_{k+1}(p)=f_{k+1}+\nabla f_{k+1}^{\mathrm{T}}p+\frac{1}{2}p^{\mathrm{T}}B_{k+1}p\tag{7} mk+1​(p)=fk+1​+∇fk+1T​p+21​pTBk+1​p(7)

式中: p = x − x k + 1 p=x-x_{k+1} p=x−xk+1​。根据式(7),若 p = 0 p=0 p=0,即停留在 x k + 1 x_{k+1} xk+1​时,有 ∇ m k + 1 ( 0 ) = ∇ f k + 1 \nabla m_{k+1}(0)=\nabla f_{k+1} ∇mk+1​(0)=∇fk+1​,于是想到后退到 x k x_k xk​(对应 p = − α k p k p=-\alpha _kp_k p=−αk​pk​)时, m k + 1 m_{k+1} mk+1​的梯度应与 ∇ f k \nabla f_k ∇fk​相等(即 m k + 1 m_{k+1} mk+1​应能保证从 x k x_k xk​到 x k + 1 x_{k+1} xk+1​的过程中对 ∇ f \nabla f ∇f足够近似),从而有
∇ m k + 1 ( − α k p k ) = ∇ f k + 1 − α k B k + 1 p k = ∇ f k (8) \nabla m_{k+1}(-\alpha _kp_k)=\nabla f_{k+1}-\alpha _kB_{k+1}p_k=\nabla f_k\tag{8} ∇mk+1​(−αk​pk​)=∇fk+1​−αk​Bk+1​pk​=∇fk​(8)
也即
B k + 1 α k p k = ∇ f k + 1 − ∇ f k (9) B_{k+1}\alpha _kp_k=\nabla f_{k+1}-\nabla f_k\tag{9} Bk+1​αk​pk​=∇fk+1​−∇fk​(9)

定义 s k = x k + 1 − x k = α k p k s_k=x_{k+1}-x_k=\alpha _kp_k sk​=xk+1​−xk​=αk​pk​, y k = ∇ f k + 1 − ∇ f k y_k=\nabla f_{k+1}-\nabla f_k yk​=∇fk+1​−∇fk​,则式(9)可写为
B k + 1 s k = y k (10) B_{k+1}s_k=y_k\tag{10} Bk+1​sk​=yk​(10)

式(10)又称为割线方程(secant equation)。令 H k = B k − 1 H_k=B_{k}^{-1} Hk​=Bk−1​,则有互补形式的割线方程:
H k + 1 y k = s k (11) H_{k+1}y_k=s_k\tag{11} Hk+1​yk​=sk​(11)

另一方面,式(8)也可这么理解(参考《最优化理论与方法》),即直接考虑以 x x x为自变量(而不是引入额外的变量 p p p):
f ( x ) ≈ f ( x k + 1 ) + ∇ f k + 1 ( x − x k + 1 ) + 1 2 ( x − x k + 1 ) T B k + 1 ( x − x k + 1 ) (12) f(x)\approx f(x_{k+1})+\nabla f_{k+1}(x-x_{k+1})+\frac{1}{2}(x-x_{k+1})^{\mathrm{T}}B_{k+1}(x-x_{k+1})\tag{12} f(x)≈f(xk+1​)+∇fk+1​(x−xk+1​)+21​(x−xk+1​)TBk+1​(x−xk+1​)(12)

对上式两边求导,有
∇ f ( x ) ≈ ∇ f k + 1 + B k + 1 ( x − x k + 1 ) (13) \nabla f(x)\approx \nabla f_{k+1}+B_{k+1}(x-x_{k+1})\tag{13} ∇f(x)≈∇fk+1​+Bk+1​(x−xk+1​)(13)

上式中令 x = x k x=x_k x=xk​即可得到式(8)。

2.拟牛顿法中的矩阵递推

拟牛顿法中的矩阵递推关系具有比较强的构造痕迹,典型的方法包括秩一和秩二校正。在介绍具体内容之前,首先引入矩阵求逆的Sherman–Morrison公式:对于非奇异方阵 A A A,设 A ˉ = A + a b T \bar{A}=A+ab^{\mathrm{T}} Aˉ=A+abT,其中 a , b ∈ R n a,b\in \mathbb{R} ^n a,b∈Rn,若 A ˉ \bar{A} Aˉ非奇异,则有
A ˉ − 1 = A − 1 − A − 1 a b T A − 1 1 + b T A − 1 a (14) \bar{A}^{-1}=A^{-1}-\frac{A^{-1}ab^{\mathrm{T}}A^{-1}}{1+b^{\mathrm{T}}A^{-1}a}\tag{14} Aˉ−1=A−1−1+bTA−1aA−1abTA−1​(14)

2.1 秩一校正

为了保证对称性,首先考虑构造如下关系(秩一校正):
B k + 1 = B k + σ v v T (15) B_{k+1}=B_k+\sigma vv^{\mathrm{T}}\tag{15} Bk+1​=Bk​+σvvT(15)

式中: σ \sigma σ取值为 1 1 1或者 − 1 -1 −1,且选择 σ \sigma σ和 v v v使得 B k + 1 B_{k+1} Bk+1​满足割线方程(10),于是有
y k = B k s k + [ σ v T s k ] v ⇒ [ σ v T s k ] v = y k − B k s k (16) y_k=B_ks_k+\left[ \sigma v^{\mathrm{T}}s_k \right] v\Rightarrow \left[ \sigma v^{\mathrm{T}}s_k \right] v=y_k-B_ks_k\tag{16} yk​=Bk​sk​+[σvTsk​]v⇒[σvTsk​]v=yk​−Bk​sk​(16)

由于 σ v T s k \sigma v^{\mathrm{T}}s_k σvTsk​为标量,因此 v v v必定为 y k − B k s k y_k-B_ks_k yk​−Bk​sk​的倍数,即存在标量 δ \delta δ使得 v = δ ( y k − B k s k ) v=\delta\left( y_k-B_ks_k \right) v=δ(yk​−Bk​sk​),则有
y k − B k s k = σ δ 2 [ s k T ( y k − B k s k ) ] ( y k − B k s k ) (17) y_k-B_ks_k=\sigma \delta ^2\left[ s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right] \left( y_k-B_ks_k \right)\tag{17} yk​−Bk​sk​=σδ2[skT​(yk​−Bk​sk​)](yk​−Bk​sk​)(17)

为确保上式成立,可选取
σ = s i g n [ s k T ( y k − B k s k ) ] , δ = ± ∣ s k T ( y k − B k s k ) ∣ − 1 2 (18) \sigma =\mathrm{sign}\left[ s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right] , \delta =\pm \left| s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right|^{-\frac{1}{2}}\tag{18} σ=sign[skT​(yk​−Bk​sk​)],δ=±∣∣​skT​(yk​−Bk​sk​)∣∣​−21​(18)

则 v = ± ∣ s k T ( y k − B k s k ) ∣ − 1 2 ( y k − B k s k ) v=\pm \left| s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right|^{-\frac{1}{2}}\left( y_k-B_ks_k \right) v=±∣∣​skT​(yk​−Bk​sk​)∣∣​−21​(yk​−Bk​sk​),结合式(15)和(18)可得
B k + 1 = B k + σ v v T = B k + s i g n [ s k T ( y k − B k s k ) ] × ∣ s k T ( y k − B k s k ) ∣ − 1 ( y k − B k s k ) ( y k − B k s k ) T = B k + ( y k − B k s k ) ( y k − B k s k ) T ( y k − B k s k ) T s k (19) \begin{aligned} B_{k+1}&=B_k+\sigma vv^{\mathrm{T}}\\ &=B_k+\mathrm{sign}\left[ s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right] \times \left| s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right|^{-1}\left( y_k-B_ks_k \right) \left( y_k-B_ks_k \right) ^{\mathrm{T}}\\ &=B_k+\frac{\left( y_k-B_ks_k \right) \left( y_k-B_ks_k \right) ^{\mathrm{T}}}{\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k} \end{aligned}\tag{19} Bk+1​​=Bk​+σvvT=Bk​+sign[skT​(yk​−Bk​sk​)]×∣∣​skT​(yk​−Bk​sk​)∣∣​−1(yk​−Bk​sk​)(yk​−Bk​sk​)T=Bk​+(yk​−Bk​sk​)Tsk​(yk​−Bk​sk​)(yk​−Bk​sk​)T​​(19)

根据式(19),取 a = y k − B k s k ( y k − B k s k ) T s k a=\frac{y_k-B_ks_k}{\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k} a=(yk​−Bk​sk​)Tsk​yk​−Bk​sk​​,    b = y k − B k s k \,\,b=y_k-B_ks_k b=yk​−Bk​sk​,且考虑到 H k H_k Hk​和 B k B_k Bk​为互逆关系,且均为对称矩阵,则由Sherman–Morrison公式(14)有
H k + 1 = H k − H k y k − B k s k ( y k − B k s k ) T s k ( y k − B k s k ) T H k 1 + ( y k − B k s k ) T H k y k − B k s k ( y k − B k s k ) T s k = H k − ( H k y k − s k ) ( H k y k − s k ) T ( y k − B k s k ) T s k + ( y k − B k s k ) T ( H k y k − s k ) = H k + ( s k − H k y k ) ( s k − H k y k ) T − ( y k − B k s k ) T s k + ( s k − H k y k ) T ( y k − B k s k ) (20) \begin{aligned} H_{k+1}&=H_k-\frac{H_k\frac{y_k-B_ks_k}{\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k}\left( y_k-B_ks_k \right) ^{\mathrm{T}}H_k}{1+\left( y_k-B_ks_k \right) ^{\mathrm{T}}H_k\frac{y_k-B_ks_k}{\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k}}\\ &=H_k-\frac{\left( H_ky_k-s_k \right) \left( H_ky_k-s_k \right) ^{\mathrm{T}}}{\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k+\left( y_k-B_ks_k \right) ^{\mathrm{T}}\left( H_ky_k-s_k \right)}\\ &=H_k+\frac{\left( s_k-H_ky_k \right) \left( s_k-H_ky_k \right) ^{\mathrm{T}}}{-\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k+\left( s_k-H_ky_k \right) ^{\mathrm{T}}\left( y_k-B_ks_k \right)} \end{aligned}\tag{20} Hk+1​​=Hk​−1+(yk​−Bk​sk​)THk​(yk​−Bk​sk​)Tsk​yk​−Bk​sk​​Hk​(yk​−Bk​sk​)Tsk​yk​−Bk​sk​​(yk​−Bk​sk​)THk​​=Hk​−(yk​−Bk​sk​)Tsk​+(yk​−Bk​sk​)T(Hk​yk​−sk​)(Hk​yk​−sk​)(Hk​yk​−sk​)T​=Hk​+−(yk​−Bk​sk​)Tsk​+(sk​−Hk​yk​)T(yk​−Bk​sk​)(sk​−Hk​yk​)(sk​−Hk​yk​)T​​(20)

考虑到
− ( y k − B k s k ) T s k + ( s k − H k y k ) T ( y k − B k s k ) = − y k T s k + s k T B k s k + ( s k − H k y k ) T y k − ( s k − H k y k ) T B k s k = − y k T s k + s k T B k s k + ( s k − H k y k ) T y k − s k T B k s k + y k T s k = ( s k − H k y k ) T y k (21) \begin{aligned} & -\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k+\left( s_k-H_ky_k \right) ^{\mathrm{T}}\left( y_k-B_ks_k \right) \\ =&-y_{k}^{\mathrm{T}}s_k+s_{k}^{\mathrm{T}}B_ks_k+\left( s_k-H_ky_k \right) ^{\mathrm{T}}y_k-\left( s_k-H_ky_k \right) ^{\mathrm{T}}B_ks_k\\ =&-y_{k}^{\mathrm{T}}s_k+s_{k}^{\mathrm{T}}B_ks_k+\left( s_k-H_ky_k \right) ^{\mathrm{T}}y_k-s_{k}^{\mathrm{T}}B_ks_k+y_{k}^{\mathrm{T}}s_k\\ =&\left( s_k-H_ky_k \right) ^{\mathrm{T}}y_k \end{aligned}\tag{21} ===​−(yk​−Bk​sk​)Tsk​+(sk​−Hk​yk​)T(yk​−Bk​sk​)−ykT​sk​+skT​Bk​sk​+(sk​−Hk​yk​)Tyk​−(sk​−Hk​yk​)TBk​sk​−ykT​sk​+skT​Bk​sk​+(sk​−Hk​yk​)Tyk​−skT​Bk​sk​+ykT​sk​(sk​−Hk​yk​)Tyk​​(21)
于是有

H k + 1 = H k + ( s k − H k y k ) ( s k − H k y k ) T ( s k − H k y k ) T y k (22) H_{k+1}=H_k+\frac{\left( s_k-H_ky_k \right) \left( s_k-H_ky_k \right) ^{\mathrm{T}}}{\left( s_k-H_ky_k \right) ^{\mathrm{T}}y_k}\tag{22} Hk+1​=Hk​+(sk​−Hk​yk​)Tyk​(sk​−Hk​yk​)(sk​−Hk​yk​)T​(22)

值得一提的是,如果利用式(11)对 H k H_k Hk​进行类似式(15)-(18)的推导(相当于 H k H_k Hk​和 B k B_k Bk​、 y k y_k yk​和 s k s_k sk​互换),也可得到式(22),式(22)与(19)正好构成了互补关系。

2.2 秩二校正(DFP和BFGS)

考虑构造如下关系:
B k + 1 = B k + σ 1 u u T + σ 2 v v T (23) B_{k+1}=B_k+\sigma _1uu^{\mathrm{T}}+\sigma _2vv^{\mathrm{T}}\tag{23} Bk+1​=Bk​+σ1​uuT+σ2​vvT(23)

式中: σ 1 \sigma _1 σ1​和 σ 2 \sigma _2 σ2​为标量, u u u和 v v v为 n n n维向量,则由割线方程(10)有

y k = B k s k + [ σ 1 u T s k ] u + [ σ 2 v T s k ] v (24) y_k=B_ks_k+\left[ \sigma _1u^{\mathrm{T}}s_k \right] u+\left[ \sigma _2v^{\mathrm{T}}s_k \right] v\tag{24} yk​=Bk​sk​+[σ1​uTsk​]u+[σ2​vTsk​]v(24)

选取 u = y k u=y_k u=yk​, [ σ 1 u T s k ] = 1 \left[ \sigma _1u^{\mathrm{T}}s_k \right] =1 [σ1​uTsk​]=1, v = B k s k v=B_ks_k v=Bk​sk​, [ σ 2 v T s k ] = − 1 \left[ \sigma _2v^{\mathrm{T}}s_k \right] =-1 [σ2​vTsk​]=−1可使得式(24)成立,于是 σ 1 = 1 y k T s k \sigma _1=\frac{1}{y_{k}^{\mathrm{T}}s_k} σ1​=ykT​sk​1​, σ 2 = − 1 s k T B k s k \sigma _2=-\frac{1}{s_{k}^{\mathrm{T}}B_ks_k} σ2​=−skT​Bk​sk​1​,则有
B k + 1 = B k + σ 1 u u T + σ 2 v v T = B k + y k y k T y k T s k − B k s k s k T B k s k T B k s k (25) \begin{aligned} B_{k+1}&=B_k+\sigma _1uu^{\mathrm{T}}+\sigma _2vv^{\mathrm{T}}\\ &=B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}-\frac{B_ks_ks_{k}^{\mathrm{T}}B_k}{s_{k}^{\mathrm{T}}B_ks_k} \end{aligned} \tag{25} Bk+1​​=Bk​+σ1​uuT+σ2​vvT=Bk​+ykT​sk​yk​ykT​​−skT​Bk​sk​Bk​sk​skT​Bk​​​(25)

式(25)即为关于 B k B_k Bk​的BFGS校正,进一步,令 a = y k y k T s k a=\frac{y_k}{y_{k}^{\mathrm{T}}s_k} a=ykT​sk​yk​​, b = y k b=y_k b=yk​,则由Sherman–Morrison公式(14)有
( B k + y k y k T y k T s k ) − 1 = B k − 1 − B k − 1 y k y k T y k T s k B k − 1 1 + y k T B k − 1 y k y k T s k = B k − 1 − B k − 1 y k y k T B k − 1 y k T s k + y k T B k − 1 y k = H k − H k y k y k T H k y k T s k + y k T H k y k (26) \begin{aligned} \left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}&=B_{k}^{-1}-\frac{B_{k}^{-1}\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}B_{k}^{-1}}{1+y_{k}^{\mathrm{T}}B_{k}^{-1}\frac{y_k}{y_{k}^{\mathrm{T}}s_k}}\\ &=B_{k}^{-1}-\frac{B_{k}^{-1}y_ky_{k}^{\mathrm{T}}B_{k}^{-1}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}B_{k}^{-1}y_k}\\ &=H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \end{aligned}\tag{26} (Bk​+ykT​sk​yk​ykT​​)−1​=Bk−1​−1+ykT​Bk−1​ykT​sk​yk​​Bk−1​ykT​sk​yk​ykT​​Bk−1​​=Bk−1​−ykT​sk​+ykT​Bk−1​yk​Bk−1​yk​ykT​Bk−1​​=Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​​(26)

再一次,令 a = − B k s k s k T B k s k a=-\frac{B_ks_k}{s_{k}^{\mathrm{T}}B_ks_k} a=−skT​Bk​sk​Bk​sk​​, b = B k s k b=B_ks_k b=Bk​sk​,则由Sherman–Morrison公式(14)有
( B k + y k y k T y k T s k − B k s k s k T B k s k T B k s k ) − 1 = ( B k + y k y k T y k T s k ) − 1 + ( B k + y k y k T y k T s k ) − 1 B k s k s k T B k s k T B k s k ( B k + y k y k T y k T s k ) − 1 1 − s k T B k ( B k + y k y k T y k T s k ) − 1 B k s k s k T B k s k = ( B k + y k y k T y k T s k ) − 1 + ( B k + y k y k T y k T s k ) − 1 B k s k s k T B k ( B k + y k y k T y k T s k ) − 1 s k T B k s k − s k T B k ( B k + y k y k T y k T s k ) − 1 B k s k = H k − H k y k y k T H k y k T s k + y k T H k y k + ( H k − H k y k y k T H k y k T s k + y k T H k y k ) B k s k s k T B k ( H k − H k y k y k T H k y k T s k + y k T H k y k ) s k T B k s k − s k T B k ( H k − H k y k y k T H k y k T s k + y k T H k y k ) B k s k = H k − H k y k y k T H k y k T s k + y k T H k y k + ( I − H k y k y k T y k T s k + y k T H k y k ) s k s k T ( I − y k y k T H k y k T s k + y k T H k y k ) s k T B k s k − s k T ( I − y k y k T H k y k T s k + y k T H k y k ) B k s k = H k − H k y k y k T H k y k T s k + y k T H k y k + ( I − H k y k y k T y k T s k + y k T H k y k ) s k s k T ( I − y k y k T H k y k T s k + y k T H k y k ) s k T y k y k T y k T s k + y k T H k y k s k (27) \begin{aligned} &\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}-\frac{B_ks_ks_{k}^{\mathrm{T}}B_k}{s_{k}^{\mathrm{T}}B_ks_k} \right) ^{-1}\\ =&\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}+\frac{\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}\frac{B_ks_ks_{k}^{\mathrm{T}}B_k}{s_{k}^{\mathrm{T}}B_ks_k}\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}}{1-s_{k}^{\mathrm{T}}B_k\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}\frac{B_ks_k}{s_{k}^{\mathrm{T}}B_ks_k}}\\ =&\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}+\frac{\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}B_ks_ks_{k}^{\mathrm{T}}B_k\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}}{s_{k}^{\mathrm{T}}B_ks_k-s_{k}^{\mathrm{T}}B_k\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}B_ks_k}\\ =&H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k}+\frac{\left( H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) B_ks_ks_{k}^{\mathrm{T}}B_k\left( H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right)}{s_{k}^{\mathrm{T}}B_ks_k-s_{k}^{\mathrm{T}}B_k\left( H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) B_ks_k}\\ =&H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k}+\frac{\left( I-\frac{H_ky_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) s_ks_{k}^{\mathrm{T}}\left( I-\frac{y_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right)}{s_{k}^{\mathrm{T}}B_ks_k-s_{k}^{\mathrm{T}}\left( I-\frac{y_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) B_ks_k}\\ =&H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k}+\frac{\left( I-\frac{H_ky_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) s_ks_{k}^{\mathrm{T}}\left( I-\frac{y_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right)}{s_{k}^{\mathrm{T}}\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k}s_k} \end{aligned}\tag{27} =====​(Bk​+ykT​sk​yk​ykT​​−skT​Bk​sk​Bk​sk​skT​Bk​​)−1(Bk​+ykT​sk​yk​ykT​​)−1+1−skT​Bk​(Bk​+ykT​sk​yk​ykT​​)−1skT​Bk​sk​Bk​sk​​(Bk​+ykT​sk​yk​ykT​​)−1skT​Bk​sk​Bk​sk​skT​Bk​​(Bk​+ykT​sk​yk​ykT​​)−1​(Bk​+ykT​sk​yk​ykT​​)−1+skT​Bk​sk​−skT​Bk​(Bk​+ykT​sk​yk​ykT​​)−1Bk​sk​(Bk​+ykT​sk​yk​ykT​​)−1Bk​sk​skT​Bk​(Bk​+ykT​sk​yk​ykT​​)−1​Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​+skT​Bk​sk​−skT​Bk​(Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​)Bk​sk​(Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​)Bk​sk​skT​Bk​(Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​)​Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​+skT​Bk​sk​−skT​(I−ykT​sk​+ykT​Hk​yk​yk​ykT​Hk​​)Bk​sk​(I−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​​)sk​skT​(I−ykT​sk​+ykT​Hk​yk​yk​ykT​Hk​​)​Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​+skT​ykT​sk​+ykT​Hk​yk​yk​ykT​​sk​(I−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​​)sk​skT​(I−ykT​sk​+ykT​Hk​yk​yk​ykT​Hk​​)​​(27)

对于式(27)的最后一项,有
( I − H k y k y k T y k T s k + y k T H k y k ) s k s k T ( I − y k y k T H k y k T s k + y k T H k y k ) s k T y k y k T y k T s k + y k T H k y k s k = ( y k T s k + y k T H k y k − H k y k y k T ) s k s k T ( y k T s k + y k T H k y k − y k y k T H k ) s k T y k y k T s k ( y k T s k + y k T H k y k ) = ( y k T s k + y k T H k y k ) s k s k T ( y k T s k + y k T H k y k − y k y k T H k ) s k T y k y k T s k ( y k T s k + y k T H k y k ) − H k y k y k T s k s k T ( y k T s k + y k T H k y k − y k y k T H k ) s k T y k y k T s k ( y k T s k + y k T H k y k ) = s k s k T ( y k T s k + y k T H k y k − y k y k T H k ) s k T y k y k T s k − H k y k y k T s k s k T s k T y k y k T s k + H k y k y k T s k s k T y k y k T H k s k T y k y k T s k ( y k T s k + y k T H k y k ) = s k s k T s k T y k + s k s k T y k T H k y k s k T y k y k T s k − s k y k T H k s k T y k − H k y k s k T s k T y k + H k y k y k T H k y k T s k + y k T H k y k (28) \begin{aligned} &\frac{\left( I-\frac{H_ky_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) s_ks_{k}^{\mathrm{T}}\left( I-\frac{y_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right)}{s_{k}^{\mathrm{T}}\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k}s_k}\\ =&\frac{\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k-H_ky_ky_{k}^{\mathrm{T}} \right) s_ks_{k}^{\mathrm{T}}\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k-y_ky_{k}^{\mathrm{T}}H_k \right)}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k \right)}\\ =&\frac{\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k \right) s_ks_{k}^{\mathrm{T}}\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k-y_ky_{k}^{\mathrm{T}}H_k \right)}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k \right)}-\frac{H_ky_ky_{k}^{\mathrm{T}}s_ks_{k}^{\mathrm{T}}\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k-y_ky_{k}^{\mathrm{T}}H_k \right)}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k \right)}\\ =&\frac{s_ks_{k}^{\mathrm{T}}\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k-y_ky_{k}^{\mathrm{T}}H_k \right)}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k}-\frac{H_ky_ky_{k}^{\mathrm{T}}s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k}+\frac{H_ky_ky_{k}^{\mathrm{T}}s_ks_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}H_k}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k \right)}\\ =&\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}+\frac{s_ks_{k}^{\mathrm{T}}y_{k}^{\mathrm{T}}H_ky_k}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k}-\frac{s_ky_{k}^{\mathrm{T}}H_k}{s_{k}^{\mathrm{T}}y_k}-\frac{H_ky_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}+\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \end{aligned}\tag{28} ====​skT​ykT​sk​+ykT​Hk​yk​yk​ykT​​sk​(I−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​​)sk​skT​(I−ykT​sk​+ykT​Hk​yk​yk​ykT​Hk​​)​skT​yk​ykT​sk​(ykT​sk​+ykT​Hk​yk​)(ykT​sk​+ykT​Hk​yk​−Hk​yk​ykT​)sk​skT​(ykT​sk​+ykT​Hk​yk​−yk​ykT​Hk​)​skT​yk​ykT​sk​(ykT​sk​+ykT​Hk​yk​)(ykT​sk​+ykT​Hk​yk​)sk​skT​(ykT​sk​+ykT​Hk​yk​−yk​ykT​Hk​)​−skT​yk​ykT​sk​(ykT​sk​+ykT​Hk​yk​)Hk​yk​ykT​sk​skT​(ykT​sk​+ykT​Hk​yk​−yk​ykT​Hk​)​skT​yk​ykT​sk​sk​skT​(ykT​sk​+ykT​Hk​yk​−yk​ykT​Hk​)​−skT​yk​ykT​sk​Hk​yk​ykT​sk​skT​​+skT​yk​ykT​sk​(ykT​sk​+ykT​Hk​yk​)Hk​yk​ykT​sk​skT​yk​ykT​Hk​​skT​yk​sk​skT​​+skT​yk​ykT​sk​sk​skT​ykT​Hk​yk​​−skT​yk​sk​ykT​Hk​​−skT​yk​Hk​yk​skT​​+ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​​(28)

结合式(27)和(28)可得
H k + 1 = ( B k + y k y k T y k T s k − B k s k s k T B k s k T B k s k ) − 1 = H k + s k s k T s k T y k + s k s k T y k T H k y k s k T y k y k T s k − s k y k T H k s k T y k − H k y k s k T s k T y k = ( I − s k y k T s k T y k ) H k + s k y k T H k y k s k T ( s k T y k ) 2 − H k y k s k T s k T y k + s k s k T s k T y k = ( I − s k y k T s k T y k ) H k − ( I − s k y k T s k T y k ) H k y k s k T s k T y k + s k s k T s k T y k = ( I − s k y k T s k T y k ) H k ( I − y k s k T s k T y k ) + s k s k T s k T y k (29) \begin{aligned} H_{k+1}&=\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}-\frac{B_ks_ks_{k}^{\mathrm{T}}B_k}{s_{k}^{\mathrm{T}}B_ks_k} \right) ^{-1}\\ &=H_k+\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}+\frac{s_ks_{k}^{\mathrm{T}}y_{k}^{\mathrm{T}}H_ky_k}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k}-\frac{s_ky_{k}^{\mathrm{T}}H_k}{s_{k}^{\mathrm{T}}y_k}-\frac{H_ky_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}\\ &=\left( I-\frac{s_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) H_k+\frac{s_ky_{k}^{\mathrm{T}}H_ky_ks_{k}^{\mathrm{T}}}{\left( s_{k}^{\mathrm{T}}y_k \right) ^2}-\frac{H_ky_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}+\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}\\ &=\left( I-\frac{s_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) H_k-\left( I-\frac{s_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) H_k\frac{y_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}+\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}\\ &=\left( I-\frac{s_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) H_k\left( I-\frac{y_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) +\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \end{aligned}\tag{29} Hk+1​​=(Bk​+ykT​sk​yk​ykT​​−skT​Bk​sk​Bk​sk​skT​Bk​​)−1=Hk​+skT​yk​sk​skT​​+skT​yk​ykT​sk​sk​skT​ykT​Hk​yk​​−skT​yk​sk​ykT​Hk​​−skT​yk​Hk​yk​skT​​=(I−skT​yk​sk​ykT​​)Hk​+(skT​yk​)2sk​ykT​Hk​yk​skT​​−skT​yk​Hk​yk​skT​​+skT​yk​sk​skT​​=(I−skT​yk​sk​ykT​​)Hk​−(I−skT​yk​sk​ykT​​)Hk​skT​yk​yk​skT​​+skT​yk​sk​skT​​=(I−skT​yk​sk​ykT​​)Hk​(I−skT​yk​yk​skT​​)+skT​yk​sk​skT​​​(29)

式(29)即为关于 H k H_k Hk​的BFGS校正。
若一开始选择构造如下关系:
H k + 1 = H k + σ 1 u u T + σ 2 v v T (30) H_{k+1}=H_k+\sigma _1uu^{\mathrm{T}}+\sigma _2vv^{\mathrm{T}}\tag{30} Hk+1​=Hk​+σ1​uuT+σ2​vvT(30)

则由割线方程(11)有
s k = H k y k + [ σ 1 u T y k ] u + [ σ 2 v T y k ] v (31) s_k=H_ky_k+\left[ \sigma _1u^{\mathrm{T}}y_k \right] u+\left[ \sigma _2v^{\mathrm{T}}y_k \right] v\tag{31} sk​=Hk​yk​+[σ1​uTyk​]u+[σ2​vTyk​]v(31)

选取 u = s k u=s_k u=sk​, [ σ 1 u T y k ] = 1 \left[ \sigma _1u^{\mathrm{T}}y_k \right] =1 [σ1​uTyk​]=1, v = H k y k v=H_ky_k v=Hk​yk​, [ σ 2 v T y k ] = − 1 \left[ \sigma _2v^{\mathrm{T}}y_k \right] =-1 [σ2​vTyk​]=−1可使得式(31)成立,于是 σ 1 = 1 s k T y k \sigma _1=\frac{1}{s_{k}^{\mathrm{T}}y_k} σ1​=skT​yk​1​, σ 2 = − 1 y k T H k y k \sigma _2=-\frac{1}{y_{k}^{\mathrm{T}}H_ky_k} σ2​=−ykT​Hk​yk​1​,则有
H k + 1 = H k + σ 1 u u T + σ 2 v v T = H k + s k s k T s k T y k − H k y k y k T H k y k T H k y k (32) \begin{aligned} H_{k+1}&=H_k+\sigma _1uu^{\mathrm{T}}+\sigma _2vv^{\mathrm{T}}\\ &=H_k+\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}H_ky_k} \end{aligned}\tag{32} Hk+1​​=Hk​+σ1​uuT+σ2​vvT=Hk​+skT​yk​sk​skT​​−ykT​Hk​yk​Hk​yk​ykT​Hk​​​(32)

式(32)即为关于 H k H_k Hk​的DFP校正。
类似(27)-(29)的推导,对式(32)利用Sherman–Morrison公式(14)可得
   B k + 1 = ( I − y k s k T s k T y k ) B k ( I − s k y k T s k T y k ) + y k y k T s k T y k (33) \,\, B_{k+1}=\left( I-\frac{y_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) B_k\left( I-\frac{s_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) +\frac{y_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}\tag{33} Bk+1​=(I−skT​yk​yk​skT​​)Bk​(I−skT​yk​sk​ykT​​)+skT​yk​yk​ykT​​(33)

式(33)即为关于 B k B_k Bk​的DFP校正。
对比式(25)、(29)、(32)和(33)可知,将 H k H_k Hk​和 B k B_k Bk​、 y k y_k yk​和 s k s_k sk​互换,则式(25)和(32)、(29)和(33)可互相推导得到。根据《最优化理论与方法》中的描述,给出任一个拟牛顿校正 H k + 1 H_{k+1} Hk+1​,通过交换 H ↔ B H\leftrightarrow B H↔B, s ↔ y s\leftrightarrow y s↔y,可以得到其关于 B B B的对偶校正 B k + 1 B_{k+1} Bk+1​,再利用Sherman–Morrison公式,就可得到关于 H H H的对偶校正 H k + 1 D H_{k+1}^{D} Hk+1D​。DFP和BFGS校正正好满足这种对偶关系:

图1 DFP和BFGS的对偶关系 \text{图1 DFP和BFGS的对偶关系} 图1 DFP和BFGS的对偶关系

值得指出的是,对于秩一校正,对式(19)进行交换操作可得到式(22),再求逆又得到了式(19),因此秩一校正是自对偶的。

2.3 DFP校正的最优性

根据《Numerical Optimization》第6章的描述,DFP校正(33)是下述优化问题的最优解:
min ⁡ B ∥ B − B k ∥ W s u b j e c t t o B = B T , B s k = y k (34) \min_B \left\| B-B_k \right\| _W\\\mathrm{subject} \mathrm{to} B=B^{\mathrm{T}}, Bs_k=y_k\tag{34} Bmin​∥B−Bk​∥W​subjecttoB=BT,Bsk​=yk​(34)

式中: B k B_k Bk​为对称正定阵,对于方阵 A A A, ∥ A ∥ W ≡ ∥ W 1 / 2 A W 1 / 2 ∥ F \left\| A \right\| _W\equiv \left\| W^{1/2}AW^{1/2} \right\| _F ∥A∥W​≡∥∥​W1/2AW1/2∥∥​F​, ∥ A ∥ F 2 ≡ ∑ i = 1 n ∑ j = 1 n a i j 2 \left\| A \right\| _{F}^{2}\equiv \sum_{i=1}^n{\sum_{j=1}^n{a_{ij}^{2}}} ∥A∥F2​≡∑i=1n​∑j=1n​aij2​为矩阵的F-范数, W = G ˉ k − 1 W=\bar{G}_{k}^{-1} W=Gˉk−1​, G ˉ k = ∫ 0 1 ∇ 2 f ( x k + τ α k p k ) d τ \bar{G}_k=\int_0^1{\nabla ^2f\left( x_k+\tau \alpha _kp_k \right) \mathrm{d}\tau} Gˉk​=∫01​∇2f(xk​+ταk​pk​)dτ为平均Hessian阵(故 W W W为对称阵),根据Taylor展开 ∇ f ( x + p ) = ∇ f ( x ) + ∫ 0 1 ∇ 2 f ( x + t p ) p d t \nabla f\left( x+p \right) =\nabla f\left( x \right) +\int_0^1{\nabla ^2f(x+tp)p\mathrm{d}t} ∇f(x+p)=∇f(x)+∫01​∇2f(x+tp)pdt可知
y k = G ˉ k α k p k = G ˉ k s k (35) y_k=\bar{G}_k\alpha _kp_k=\bar{G}_ks_k\tag{35} yk​=Gˉk​αk​pk​=Gˉk​sk​(35)

书中没有给出详细证明,在math.stackexchange.com网站上有高手进行了推导,这里进行整理总结。
首先,原优化问题(34)的目标函数为 min ⁡ B ∥ W 1 / 2 B W 1 / 2 − W 1 / 2 B k W 1 / 2 ∥ F \displaystyle{\min_B} \left\| W^{1/2}BW^{1/2}-W^{1/2}B_kW^{1/2} \right\| _F Bmin​∥∥∥​W1/2BW1/2−W1/2Bk​W1/2∥∥∥​F​,记 B ^ = W 1 / 2 B W 1 / 2 \hat{B}=W^{1/2}BW^{1/2} B^=W1/2BW1/2, B ^ k = W 1 / 2 B k W 1 / 2 \hat{B}_k=W^{1/2}B_kW^{1/2} B^k​=W1/2Bk​W1/2,由式(35)可知
W y k = s k ⇔ W − 1 / 2 W y k = W − 1 / 2 s k ⇔ W 1 / 2 y k = W − 1 / 2 s k (36) Wy_k=s_k\Leftrightarrow W^{-1/2}Wy_k=W^{-1/2}s_k\Leftrightarrow W^{1/2}y_k=W^{-1/2}s_k\tag{36} Wyk​=sk​⇔W−1/2Wyk​=W−1/2sk​⇔W1/2yk​=W−1/2sk​(36)

记 y ^ k = W 1 / 2 y k \hat{y}_k=W^{1/2}y_k y^​k​=W1/2yk​, s ^ k = W − 1 / 2 s k \hat{s}_k=W^{-1/2}s_k s^k​=W−1/2sk​,则 y ^ k = s ^ k \hat{y}_k=\hat{s}_k y^​k​=s^k​,又由于
B s k = y k ⇔ W 1 / 2 B W 1 / 2 W − 1 / 2 s k = W 1 / 2 y k ⇔ B ^ s ^ k = y ^ k ⇔ B ^ s ^ k = s ^ k (37) Bs_k=y_k\Leftrightarrow W^{1/2}BW^{1/2}W^{-1/2}s_k=W^{1/2}y_k\Leftrightarrow \hat{B}\hat{s}_k=\hat{y}_k\Leftrightarrow \hat{B}\hat{s}_k=\hat{s}_k\tag{37} Bsk​=yk​⇔W1/2BW1/2W−1/2sk​=W1/2yk​⇔B^s^k​=y^​k​⇔B^s^k​=s^k​(37)

则原优化问题(34)可写为
min ⁡ B ^ ∥ B ^ − B ^ k ∥ F s u b j e c t t o B ^ s ^ k = s ^ k (38) \min_{\hat{B}} \left\| \hat{B}-\hat{B}_k \right\| _F\\\mathrm{subject} \mathrm{to} \hat{B}\hat{s}_k=\hat{s}_k\tag{38} B^min​∥∥∥​B^−B^k​∥∥∥​F​subjecttoB^s^k​=s^k​(38)

可见 s ^ k \hat{s}_k s^k​为 B ^ \hat{B} B^的特征向量,由于 B ^ \hat{B} B^为对称矩阵(因此为正规矩阵,必酉相似于对角阵),构造酉矩阵 U = [ u ∣ u ⊥ ] U=\left[ u|u_{\bot} \right] U=[u∣u⊥​],其中 u = s ^ k ∥ s ^ k ∥ u=\frac{\hat{s}_k}{\left\| \hat{s}_k \right\|} u=∥s^k​∥s^k​​, u ⊥ u_{\bot} u⊥​为 u u u的正交补,则 u T B ^ u = u T u = 1 u^{\mathrm{T}}\hat{B}u=u^{\mathrm{T}}u=1 uTB^u=uTu=1, u ⊥ T B ^ u = u ⊥ T u = 0 u_{\bot}^{\mathrm{T}}\hat{B}u=u_{\bot}^{\mathrm{T}}u=0 u⊥T​B^u=u⊥T​u=0,从而有
U T B ^ k U − U T B ^ U = [ u T u ⊥ T ] B ^ k [ u u ⊥ ] − [ u T u ⊥ T ] B ^ [ u u ⊥ ] = [ u T B ^ k u u T B ^ k u ⊥ u ⊥ T B ^ k u u ⊥ T B ^ k u ⊥ ] − [ 1 0 0 u ⊥ T B ^ u ⊥ ] (39) \begin{aligned} U^{\mathrm{T}}\hat{B}_kU-U^{\mathrm{T}}\hat{B}U=&\left[ \begin{array}{c} u^{\mathrm{T}}\\ u_{\bot}^{\mathrm{T}}\\\end{array} \right] \hat{B}_k\left[ \begin{matrix} u& u_{\bot}\\\end{matrix} \right] -\left[ \begin{array}{c} u^{\mathrm{T}}\\ u_{\bot}^{\mathrm{T}}\\\end{array} \right] \hat{B}\left[ \begin{matrix} u& u_{\bot}\\\end{matrix} \right] \\ =&\left[ \begin{matrix} u^{\mathrm{T}}\hat{B}_ku& u^{\mathrm{T}}\hat{B}_ku_{\bot}\\ u_{\bot}^{\mathrm{T}}\hat{B}_ku& u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}\\\end{matrix} \right] -\left[ \begin{matrix} 1& 0\\ 0& u_{\bot}^{\mathrm{T}}\hat{B}u_{\bot}\\\end{matrix} \right] \end{aligned}\tag{39} UTB^k​U−UTB^U==​[uTu⊥T​​]B^k​[u​u⊥​​]−[uTu⊥T​​]B^[u​u⊥​​][uTB^k​uu⊥T​B^k​u​uTB^k​u⊥​u⊥T​B^k​u⊥​​]−[10​0u⊥T​B^u⊥​​]​(39)

考虑到酉相似矩阵的具有相同的F-范数,因此
∥ B ^ − B ^ k ∥ F 2 = ∥ U T ( B ^ k − B ^ ) U ∥ F 2 = ∥ [ u T B ^ k u − 1 u T B ^ k u ⊥ u ⊥ T B ^ k u u ⊥ T B ^ k u ⊥ − u ⊥ T B ^ u ⊥ ] ∥ F 2 = ( u T B ^ k u − 1 ) 2 + ∥ u T B ^ k u ⊥ ∥ F 2 + ∥ u ⊥ T B ^ k u ∥ F 2 + ∥ u ⊥ T B ^ k u ⊥ − u ⊥ T B ^ u ⊥ ∥ F 2 (40) \begin{aligned} \left\| \hat{B}-\hat{B}_k \right\| _{F}^{2}=&\left\| U^{\mathrm{T}}(\hat{B}_k-\hat{B})U \right\| _{F}^{2}\\ =&\left\| \left[ \begin{matrix} u^{\mathrm{T}}\hat{B}_ku-1& u^{\mathrm{T}}\hat{B}_ku_{\bot}\\ u_{\bot}^{\mathrm{T}}\hat{B}_ku& u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}-u_{\bot}^{\mathrm{T}}\hat{B}u_{\bot}\\\end{matrix} \right] \right\| _{F}^{2}\\ =&(u^{\mathrm{T}}\hat{B}_ku-1)^2+\left\| u^{\mathrm{T}}\hat{B}_ku_{\bot} \right\| _{F}^{2}+\left\| u_{\bot}^{\mathrm{T}}\hat{B}_ku \right\| _{F}^{2}+\left\| u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}-u_{\bot}^{\mathrm{T}}\hat{B}u_{\bot} \right\| _{F}^{2} \end{aligned}\tag{40} ∥∥∥​B^−B^k​∥∥∥​F2​===​∥∥∥​UT(B^k​−B^)U∥∥∥​F2​∥∥∥∥​[uTB^k​u−1u⊥T​B^k​u​uTB^k​u⊥​u⊥T​B^k​u⊥​−u⊥T​B^u⊥​​]∥∥∥∥​F2​(uTB^k​u−1)2+∥∥∥​uTB^k​u⊥​∥∥∥​F2​+∥∥∥​u⊥T​B^k​u∥∥∥​F2​+∥∥∥​u⊥T​B^k​u⊥​−u⊥T​B^u⊥​∥∥∥​F2​​(40)

式(40)右端只有最后一项与 B ^ \hat{B} B^有关(也即与 B B B有关),因此最优解必定满足 u ⊥ T B ^ u ⊥ = u ⊥ T B ^ k u ⊥ u_{\bot}^{\mathrm{T}}\hat{B}u_{\bot}=u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot} u⊥T​B^u⊥​=u⊥T​B^k​u⊥​,根据式(39)可知
B ^ = U [ 1 0 0 u ⊥ T B ^ k u ⊥ ] U T = [ u u ⊥ ] [ 1 0 0 u ⊥ T B ^ k u ⊥ ] [ u T u ⊥ T ] = u u T + u ⊥ u ⊥ T B ^ k u ⊥ u ⊥ T = u u T + ( I − u u T ) B ^ k ( I − u u T ) (41) \begin{aligned} \hat{B}=&U\left[ \begin{matrix} 1& 0\\ 0& u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}\\\end{matrix} \right] U^{\mathrm{T}}\\ =&\left[ \begin{matrix} u& u_{\bot}\\\end{matrix} \right] \left[ \begin{matrix} 1& 0\\ 0& u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}\\\end{matrix} \right] \left[ \begin{array}{c} u^{\mathrm{T}}\\ u_{\bot}^{\mathrm{T}}\\\end{array} \right] \\ =&uu^{\mathrm{T}}+u_{\bot}u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}u_{\bot}^{\mathrm{T}}\\ =&uu^{\mathrm{T}}+(I-uu^{\mathrm{T}})\hat{B}_k(I-uu^{\mathrm{T}}) \end{aligned}\tag{41} B^====​U[10​0u⊥T​B^k​u⊥​​]UT[u​u⊥​​][10​0u⊥T​B^k​u⊥​​][uTu⊥T​​]uuT+u⊥​u⊥T​B^k​u⊥​u⊥T​uuT+(I−uuT)B^k​(I−uuT)​(41)

式(41)中用到了如下关系式:
I = U U T = [ u u ⊥ ] [ u T u ⊥ T ] = u u T + u ⊥ u ⊥ T ⇔ u ⊥ u ⊥ T = I − u u T (42) \begin{aligned} I=&UU^{\mathrm{T}}\\ =&\left[ \begin{matrix} u& u_{\bot}\\\end{matrix} \right] \left[ \begin{array}{c} u^{\mathrm{T}}\\ u_{\bot}^{\mathrm{T}}\\\end{array} \right] \\ =&uu^{\mathrm{T}}+u_{\bot}u_{\bot}^{\mathrm{T}}\\ \Leftrightarrow& u_{\bot}u_{\bot}^{\mathrm{T}}=I-uu^{\mathrm{T}} \end{aligned}\tag{42} I===⇔​UUT[u​u⊥​​][uTu⊥T​​]uuT+u⊥​u⊥T​u⊥​u⊥T​=I−uuT​(42)

根据 B ^ \hat{B} B^的定义和式(41)可知
B = W − 1 / 2 B ^ W − 1 / 2 = W − 1 / 2 u u T W − 1 / 2 + ( I − W − 1 / 2 u u T W 1 / 2 ) B k ( I − W 1 / 2 u u T W − 1 / 2 ) (43) \begin{aligned} B=&W^{-1/2}\hat{B}W^{-1/2}\\ =&W^{-1/2}uu^{\mathrm{T}}W^{-1/2}+(I-W^{-1/2}uu^{\mathrm{T}}W^{1/2})B_k(I-W^{1/2}uu^{\mathrm{T}}W^{-1/2}) \end{aligned}\tag{43} B==​W−1/2B^W−1/2W−1/2uuTW−1/2+(I−W−1/2uuTW1/2)Bk​(I−W1/2uuTW−1/2)​(43)

考虑到
∥ s ^ k ∥ 2 = s ^ k T s ^ k = y ^ k T s ^ k = y k T s k ⇓ u u T = s ^ k s ^ k T ∥ s ^ k ∥ 2 = y ^ k y ^ k T y k T s k = W 1 / 2 y k y k T y k T s k W 1 / 2 ⇓ { W − 1 / 2 u u T W − 1 / 2 = y k y k T y k T s k W − 1 / 2 u u T W 1 / 2 = ( W − 1 / 2 u u T W − 1 / 2 ) W = y k y k T y k T s k W = y k s k T y k T s k W 1 / 2 u u T W − 1 / 2 = W ( W − 1 / 2 u u T W − 1 / 2 ) = W y k y k T y k T s k = s k y k T y k T s k (44) \left\| \hat{s}_k \right\| ^2=\hat{s}_{k}^{\mathrm{T}}\hat{s}_k=\hat{y}_{k}^{\mathrm{T}}\hat{s}_k=y_{k}^{\mathrm{T}}s_k\\\Downarrow \\uu^{\mathrm{T}}=\frac{\hat{s}_k\hat{s}_{k}^{\mathrm{T}}}{\left\| \hat{s}_k \right\| ^2}=\frac{\hat{y}_k\hat{y}_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}=W^{1/2}\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}W^{1/2}\\\Downarrow \\\left\{ \begin{array}{c} W^{-1/2}uu^{\mathrm{T}}W^{-1/2}=\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}\\ W^{-1/2}uu^{\mathrm{T}}W^{1/2}=\left( W^{-1/2}uu^{\mathrm{T}}W^{-1/2} \right) W=\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}W=\frac{y_ks_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}\\ W^{1/2}uu^{\mathrm{T}}W^{-1/2}=W\left( W^{-1/2}uu^{\mathrm{T}}W^{-1/2} \right) =W\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}=\frac{s_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}\\\end{array} \right.\tag{44} ∥s^k​∥2=s^kT​s^k​=y^​kT​s^k​=ykT​sk​⇓uuT=∥s^k​∥2s^k​s^kT​​=ykT​sk​y^​k​y^​kT​​=W1/2ykT​sk​yk​ykT​​W1/2⇓⎩⎪⎪⎪⎨⎪⎪⎪⎧​W−1/2uuTW−1/2=ykT​sk​yk​ykT​​W−1/2uuTW1/2=(W−1/2uuTW−1/2)W=ykT​sk​yk​ykT​​W=ykT​sk​yk​skT​​W1/2uuTW−1/2=W(W−1/2uuTW−1/2)=WykT​sk​yk​ykT​​=ykT​sk​sk​ykT​​​(44)

将式(44)代入式(43)即可得到DFP校正(33)。

标签:yk,推导,笔记,sk,ykT,ky,left,最优化,mathrm
来源: https://blog.csdn.net/xiaofei473/article/details/115712301