首页 > 其他分享> > 自动微分延迟计算

自动微分延迟计算

2021-12-15 06:31:26 作者：互联网

自动微分延迟计算

BP(反向传播)，为什么BP不好呢？每一步都会保存了上一步中，计算出来的缓冲数据，这样在每次进行反向传播时，占用的内存比较高。

自动微分的核心概念，延迟计算。

先选取一个目标函数，求输出两个权重参数(W_1，W_2W1，W2)的导数。

先求出1/x的导数-1/x_2−1/x2，没有计算这个-1/x_2−1/x2的值。

前项与1的加法操作的导数，这个导数很显然是1。

对exp这个op求导，可得到导数exp(x)。

可得到JJ对W_1W1的偏导，将前面所有得到的式子，相应的op的输入代入，依次相乘。

对于W_2W2，同样如此。

自动求导和BP最主要的区别。自动求导没有提前计算每个算子的导数，仅仅将计算式子表达出来，直到确实需要求这个值时，整个计算流程才算开始执行。

这是一种延迟计算的思想，将所有要计算的路线都规划好后，再进行计算。这样一是可不用提前计算出中间变量；二是可根据导出的计算节点的拓扑关系，进行一些优化工作，比较灵活。

实现Autodiff算法

尝试实现这样一个autodiff的算法，先介绍assignment。

算法的基本流程：

根据目标函数f=(e^x+1)*e^xf=(ex+1)∗ex，求每个变量x_1，x_2，x_3，x_4x1，x2，x3，x4的导数。相应的伪代码如下：

这里是逆序推导的方式，从输出端一直推导到输入端，每一个节点都称为node，如x_4=x_2*x_3x4=x2∗x3，这时x_4x4就相当于一个node，这个node与这个节点的op有关系，如这个的op，就是mul乘法。

计算输出的导数(就是x_4x4的导数)，显然是1，x_4x4就是输出，node_to_grad储存每个node对应的导数。

这里\overline{x_4}x4表示x_4x4，这个node的输出导数。

走到第一个节点，x_4x4，往上走，求x_4x4两个输入的导数(x_2，x_3x2，x3)，逆序。

grad <- sum partial adjoints from output edges，将这个node的所有输出的边缘导数相加，求x_4x4两个输入node的导数，乘法操作，x_2x2的导数是x_3x3，x_3x3的导数是x_2x2。

x_3x3的导数是\overline{x_2}^1x21，x_2x2的导数是\overline{x_3}x3，红色连线的乘号表示，如果要求x_2x2的导数，df/dx_2=\overline{x_4}*\overline{x_3}df/dx2=x4∗x3。另一个同理。

将已经求出的所有node的导数，放到node_to_grad map中，后续使用。

将x_3x3的输出边缘导数都加起来，得到grad，用做x_3x3的输入的upstream gradient。

x_3x3的输入为x_2+1x2+1，第二次对x_2x2求导，这次在x_2+1x2+1式子中。求出的导数记为\overline{x_2}^2x22。

对x_2x2求导后，得到了x_3=x_2+1x3=x2+1式子，x_2x2的导数。为什么要专门强调一下呢？x_2x2在两个node中都出现了，最开始的目标函数展示：f=(e^x+1)*e^xf=(ex+1)∗ex，x_2x2的node，相当于e^xex的op，e^xex的op在式子中，出现了两次，分别是(e^x+1)(ex+1)与e^xex中，利用求导乘法法则，分别得到了两个导数值。

x^2x2的两个导数为\overline{x_2}^1x21和\overline{x_2}^2x22。

将x^2x2的两个导数\overline{x_2}^1x21和\overline{x_2}^2x22进行求和，就是\overline{x_2}x2，下面的红色连接的计算路径\overline{x_2}=\overline{x_4} * \overline{x_2}^1+\overline{x_2}^1(\overline{x_4}*\overline{x_3})x2=x4∗x21+x21(x4∗x3)

对于x_2x2的输入x_1x1，就是本身乘以upstream gradient。

将x_1x1的导数结果，添加进去。

x_1x1就是输入，整个自动微分结束。

相关代码

相关的实现代码在assignment-1中，重要的简单说一下:

Node是每个计算节点，每个具体的node继承类。类重载了加法和乘法操作。

lass Node(object):

    """Node in a computation graph."""

    def __init__(self):

        """Constructor，new node is indirectly created by Op object __call__ method.

            Instance variables

            ------------------

            self.inputs: the list of input nodes.

            self.op: the associated op object，

                e.g. add_op object if this node is created by adding two other nodes.

            self.const_attr: the add or multiply constant，

                e.g. self.const_attr=5 if this node is created by x+5.

            self.name: node name for debugging purposes.

"""

        self.inputs=[]

        self.op=None

        self.const_attr=None

        self.name=""

    def __add__(self，other):

        """Adding two nodes return a new node."""

        if isinstance(other，Node):

            new_node=add_op(self，other)

        else:

            # Add by a constant stores the constant in the new node's const_attr field.

            # 'other' argument is a constant

            new_node=add_byconst_op(self，other)

        return new_node

    def __mul__(self，other):

        """Multiplying two nodes return a new node."""

        if isinstance(other，Node):

            new_node=mul_op(self，other)

        else:

            new_node=mul_byconst_op(self，other)

        return new_node

    # Allow left-hand-side add and multiply.

    __radd__=__add__

    __rmul__=__mul__

    def __str__(self):

        """Allow print to display node name."""

        return self.name

    __repr__=__str__

最重要的自动求导的代码：

def gradients(output_node，node_list):

    """Take gradient of output node with respect to each node in node_list.

    Parameters

    ----------

    output_node: output node that we are taking derivative of.

    node_list: list of nodes that we are taking derivative wrt.

    Returns

    -------

    A list of gradient values，one for each node in node_list respectively.

"""

    # a map from node to a list of gradient contributions from each output node

    # 辅助map 用于存放所有node对应的未整理的grad

    node_to_output_grads_list={}

    # Special note on initializing gradient of output_node as oneslike_op(output_node):

    # We are really taking a derivative of the scalar reduce_sum(output_node)

    # instead of the vector output_node. But this is the common case for loss function.

    node_to_output_grads_list[output_node]=[oneslike_op(output_node)]

    # a map from node to the gradient of that node

    # 整理后的 对应node的grad map

    node_to_output_grad={}

    # Traverse graph in reverse topological order given the output_node that we are taking gradient wrt.

    # 首先使用dfs先序遍历一遍表达式 后再逆序一下 就得到反向的元素顺序

    reverse_topo_order=reversed(find_topo_sort([output_node]))

    for node in reverse_topo_order:

        # 一些变量的导数可能有多个部分 需要将计算出来的多项式导数加起来

        grad=sum_node_list(node_to_output_grads_list[node])

        # 最终将计算好的导数存放到 node_to_output_grad 这个 map 中

        node_to_output_grad[node]=grad

        # 根据上一个算子传递过来的 grad 与当前这个 算子node 得到 inputs 的 grads

        input_grads=node.op.gradient(node，grad)

        for id in range(len(node.inputs)):

            if node.inputs[id] not in node_to_output_grads_list:

                node_to_output_grads_list[node.inputs[id]]=[input_grads[id]]

            else:

              node_to_output_grads_list[node.inputs[id]].append(input_grads[id])

    grad_node_list=[node_to_output_grad[node] for node in node_list]

    return grad_node_list

只要没有实际执行，所有的node都是计算状态，一系列计算式子，没有实际进行计算。

这里简单演示一下，目标函数y=x_1*x_2+x_1y=x1∗x2+x1中，对x_1x1和x_2x2的导数，grad_x1和grad_x1为需要所求的结果。

x1=ad.Variable(name="x1")

x2=ad.Variable(name="x2")

y=x1 * x2+x1

grad_x1，grad_x2=ad.gradients(y，[x1，x2])

在debug中，可看到：

在深度学习中，损失函数值都是一个数，求损失对权重的梯度时，先将损失的向量转化为单个数字，用这个数字对权重求梯度，最后得到的值是单个值，不是向量也不是数组。

小结

最后，拿两张图小结下前面的几种微分方法：

参考资料

https://oldpan.me/archives/sysml-lesson-3

Clad-DIANA-HEP

DLSYS-lecture4

https://blog.csdn.net/qq_40878431/article/details/85942864

https://medium.com/machine-intelligence-report/how-do-neural-networks-work-57d1ab5337ce

Automatic Differentiation in Machine Learning: a Survey

标签：node,__,导数,list,微分,自动,output,grad,延迟
来源： https://www.cnblogs.com/wujianming-110117/p/15690927.html