其他分享
首页 > 其他分享> > Gradient descent for neural networks

Gradient descent for neural networks

作者:互联网

Gradient descent for neural networks

还是针对之前概览中的这个网络,并且考虑它做的是binary classification;

image-20220403214655811

则我们现在来讨论其中的梯度下降方法,

\[Parameters(参数): \mathop{W^{[1]}}\limits_{(n^{[1]},n^{[0]})}, \mathop{b^{[1]}}\limits_{(n^{[1]},1)}, \mathop{W^{[2]}}\limits_{(n^{[2]},n^{[1]})}, \mathop{b^{[2]}}\limits_{(n^{[2]},1)} \\n_x=n^{[0]},n^{[1]},n^{[2]}=1 \\Cost\;function:J(W^{[1]},b^{[1]},W^{[2]},b^{[2]}) =\frac{1}{m}\sum^{n}_{i=1}\mathcal{L}(\mathop{\hat{y}}\limits_{\uparrow_{a^{[2]}}},y) \\Gradient\;descent: \\Repeat:Compute\;predictions\;(\hat{y}^{(i)},i=1,\cdots ,m)\\ \begin{array}{c} dW^{[1]} = \frac{ \partial J}{ \partial W^{[1]}}, db^{[1]} = \frac{ \partial J}{ \partial b^{[1]}},\\ dW^{[2]} = \frac{ \partial J}{ \partial W^{[2]}}, db^{[2]} = \frac{ \partial J}{ \partial b^{[2]}},\\ W^{[1]}: = W^{[1]}-\alpha dW^{[1]}\\ b^{[1]}: = b^{[1]}-\alpha db^{[1]}\\ W^{[2]}: = W^{[2]}-\alpha dW^{[2]}\\ b^{[2]}: = b^{[2]}-\alpha db^{[2]}\\ \end{array} \\(注意:\alpha是学习率,[:=]也可以写成[=],只不过为了更好表示迭代) \]

那么问题现在显而易见:我们如何去求偏导数呢?

\[Formal\;propagation:\\ Z^{[1]}=W^{[1]}X+b^{[1]}\\ A^{[1]}=g^{[1]}(Z^{[1]})\\ Z^{[2]}=W^{[2]}A^{[1]}+b^{[2]}\\ A^{[2]}=g^{[2]}(Z^{[2]})=\sigma(Z^{[2]}) \]

\[Back\;propagation:\\ dZ^{[2]}=A^{[2]}-Y \quad其中Y=[y^{[1]},y^{[2]},\cdots,y^{[m]}]\\ dW^{[2]}=\frac{1}{m}dZ^{[2]}A^{[1]T}\\ db^{[2]}=\frac{1}{m}np.sum(dZ^{[1]},axis=1,\mathop{keepdims=True}\limits^{to\;maintain\;db^{[2]}\;isn't\;(n^{[2]},)\;but\;(n^{[2]},1)})\\ dZ^{[1]}=\mathop{W^{[2]T}dZ^{[2]}}\limits_{(n^{[1]},m)}*\mathop{g^{[1]'}(Z^{[1]})}\limits_{(n^{[1]},m)}\quad*是对应元素相乘\\ dW^{[1]}=\frac{1}{m}dZ^{[1]}X^T\\ db^{[1]}=\frac{1}{m}np.sum(dZ^{[1]},axis=1,\mathop{keepdims=True}\limits^{to\;maintain\;db^{[1]}\;isn't\;(n^{[1]},)\;but\;(n^{[1]},1)}) \]


标签:frac,descent,limits,Gradient,db,dZ,mathop,partial,networks
来源: https://www.cnblogs.com/Linkdom/p/16098092.html