其他分享
首页 > 其他分享> > DNN-BP学习笔记

DNN-BP学习笔记

作者:互联网

已知:\(a^l = \sigma(z^l) = \sigma(W^la^{l-1} + b^l)\)

定义二次损失函数(当然也可以是其他损失函数):

\(J(W,b) = \frac{1}{2}||a^L-y||^2\)

目标:求解每一层的W,b。

首先,输出层第L层有:

\[a^L = \sigma(z^L) = \sigma(W^La^{L-1} + b^L)\\ J(W,b) = \frac{1}{2}||a^L-y||^2 = \frac{1}{2}|| \sigma(W^La^{L-1} + b^L)-y||^2 \]

分别对W,b求梯度:

\[\frac{\partial J(W,b)}{\partial W^L} = \frac{\partial J(W,b)}{\partial z^L}\frac{\partial z^L}{\partial W^L} , \frac{\partial J(W,b)}{\partial b^L} = \frac{\partial J(W,b)}{\partial z^L}\frac{\partial z^L}{\partial b^L} \]

有公共部分\(\frac{\partial J(W,b)}{\partial z^L}\),令\(\delta^L = \frac{\partial J(W,b)}{\partial z^L}\)。

为了方便理解,先令\(\delta^L_j = \frac{\partial J(W,b)}{\partial z^L_j}\)(表示第L 层的第 j 个神经元上的误差),则:

\[ \delta^L_j = \frac{\partial J(W,b)}{\partial z^L_j} \\ = \sum_k \frac{\partial J(W,b)}{\partial a^L_k} \frac{\partial a^L_k}{\partial z^L_j} \\ = \frac{\partial J(W,b)}{\partial a^L_j} \frac{\partial a^L_j}{\partial z^L_j} \]

\[ \delta^L_j = \frac{\partial J(W,b)}{\partial a^L_j} \sigma'(z^L_j)= (a^L_j-y_j) \sigma^{'}(z^L_j)\\ \delta^L = \frac{\partial J(W,b)}{\partial z^L} = (a^L-y)\odot \sigma^{'}(z^L) \]

所以,第L层W,b的梯度为:

\[\frac{\partial J(W,b)}{\partial W^L} = \frac{\partial J(W,b)}{\partial z^L}\frac{\partial z^L}{\partial W^L} =(a^L-y) \odot \sigma^{'}(z^L)(a^{L-1})^T\\ \frac{\partial J(W,b)}{\partial b^L} = \frac{\partial J(W,b)}{\partial z^L}\frac{\partial z^L}{\partial b^L} =(a^L-y)\odot \sigma^{'}(z^L) \]


接下来,我们需要往前递推求出L-1,L-2 ...层的梯度。由神经网络前向传播特性:

\[z^{l+1}= W^{l+1}a^{l} + b^{l+1} = W^{l+1}\sigma(z^l) + b^{l+1} \]

可得:

\[\delta^{l} = \frac{\partial J(W,b)}{\partial z^l} = \frac{\partial J(W,b)}{\partial z^{l+1}}\frac{\partial z^{l+1}}{\partial z^{l}} = \delta^{l+1}\frac{\partial z^{l+1}}{\partial z^{l}} = (W^{l+1})^T\delta^{l+1}\odot \sigma^{'}(z^l) \]

\[ \delta^l_j = \frac{\partial J(W,b)}{\partial z^l_j} = \sum_k \frac{\partial J(W,b)}{\partial z^{l+1}_k} \frac{\partial z^{l+1}_k}{\partial z^l_j} = \sum_k \frac{\partial z^{l+1}_k}{\partial z^l_j} \delta^{l+1}_k \]

\[ z^{l+1}_k = \sum_j w^{l+1}_{kj} a^l_j +b^{l+1}_k = \sum_j w^{l+1}_{kj} \sigma(z^l_j) +b^{l+1}_k \]

\[ \frac{\partial z^{l+1}_k}{\partial z^l_j} = w^{l+1}_{kj} \sigma'(z^l_j) \]

\[ \delta^l_j = \sum_k w^{l+1}_{kj} \delta^{l+1}_k \sigma'(z^l_j) \]

\[ \delta^l = ((w^{l+1})^T \delta^{l+1}) \odot \sigma'(z^l) \]

所以:

\[\frac{\partial J(W,b)}{\partial W^l} = \frac{\partial J(W,b)}{\partial z^l} \frac{\partial z^l}{\partial W^l} = \delta^{l}(a^{l-1})^T\\ \frac{\partial J(W,b)}{\partial b^l} = \frac{\partial J(W,b)}{\partial z^l} \frac{\partial z^l}{\partial b^l} = \delta^{l} \]

附注,符号\(\odot\)代表矩阵中对应各值分别相乘,区别于矩阵乘法。

标签:partial,sum,DNN,odot,笔记,BP,delta,frac,sigma
来源: https://www.cnblogs.com/goodness1911/p/11835292.html