其他分享
首页 > 其他分享> > TVM:解析TVM算子

TVM:解析TVM算子

作者:互联网

在对[TVM:编译流程]一文中,从ONNX模型中读取模型并转换为relay IR,其中调用_convert_operator函数关于将onnx算子转换成Relay算子,其中如何实现当时直接跳过去了,本节将以卷积算子为例,看下Relay表达式是如何转换为TOPI算子并结合TVM的scheduler在后端上运行的

Relay卷积算子的转换过程

卷积算子转换过程如下:
"Conv": Conv.get_converter(opset),中get_converter()的实现如下:

class OnnxOpConverter(object):
    """A helper class for holding onnx op converters."""

    @classmethod
    def get_converter(cls, opset):
        """Get converter matches given opset.

        Parameters
        ----------
        opset: int
            opset from model.

        Returns
        -------
        converter, which should be `_impl_vx`. Number x is the biggest
            number smaller than or equal to opset belongs to all support versions.
        """
        versions = [int(d.replace("_impl_v", "")) for d in dir(cls) if "_impl_v" in d]
        versions = sorted(versions + [opset])
        version = versions[max([i for i, v in enumerate(versions) if v == opset]) - 1]
        if hasattr(cls, "_impl_v{}".format(version)):
            return getattr(cls, "_impl_v{}".format(version))
        raise NotImplementedError(
            "opset version {} of {} not implemented".format(version, cls.__name__)
        )

他是调用了父类的方法get_converter,再通过getattr()调用Conv类中的_impl_v1实现,如下:

class Conv(OnnxOpConverter):
    """Operator converter for Conv."""

    @classmethod
    def _impl_v1(cls, inputs, attr, params):
        # Use shape of input to determine convolution type.
        data = inputs[0]
        kernel = inputs[1]
        input_shape = infer_shape(data)
        ndim = len(input_shape)
		'处理ONNX的卷积算子属性与TVM Relay的卷积OP属性不一致的问题'
        kernel_type = infer_type(inputs[1])
        kernel_shapes = [get_const_tuple(kernel_type.checked_type.shape)]

        if "kernel_shape" not in attr:
            attr["kernel_shape"] = kernel_shapes[0][2:]

        if "auto_pad" in attr:
            attr["auto_pad"] = attr["auto_pad"].decode("utf-8")
            if attr["auto_pad"] in ("SAME_UPPER", "SAME_LOWER"):
                # Warning: Convolution does not yet support dynamic shapes,
                # one will need to run dynamic_to_static on this model after import
                data = autopad(
                    data,
                    attr.get("strides", [1] * (ndim - 2)),
                    attr["kernel_shape"],
                    attr.get("dilations", [1] * (ndim - 2)),
                    mode=attr["auto_pad"],
                )
            elif attr["auto_pad"] == "VALID":
                attr["pads"] = [0 for i in range(ndim - 2)]
            elif attr["auto_pad"] == "NOTSET":
                pass
            else:
                msg = 'Value {} in attribute "auto_pad" of operator Conv is invalid.'
                raise tvm.error.OpAttributeInvalid(msg.format(attr["auto_pad"]))
            attr.pop("auto_pad")

        attr["channels"] = kernel_shapes[0][0]
		`完成属性的转换以及OP转换`
        out = AttrCvt(
            op_name=dimension_picker("conv"),
            transforms={
                "kernel_shape": "kernel_size",
                "dilations": ("dilation", 1),
                "pads": ("padding", 0),
                "group": ("groups", 1),
            },
            custom_check=dimension_constraint(),
        )([data, kernel], attr, params)

        use_bias = len(inputs) == 3
        if use_bias:
            out = _op.nn.bias_add(out, inputs[2])
        return out

以及AttrCvt函数的实现:


class AttrCvt(object):
    """Common attribute converter. An AttrConverter instance is a callable:
    ```
    attr_converter = AttrConverter(op_name, transforms={'a':'b', 'c':('d', 1)})
    new_op_name, new_attr = attr_converter(attrs)
    ```

    Parameters
    ----------
    op_name : str or callable
        If set as str, returned operator name is the str.
        If set as callable, returned operator is the str returned by calling:
        `op_name = func(attr)`

    transforms : dict of `new_name, or (new_name, default_value, transform function)`
        If only a new_name is provided, it's like renaming the attribute name.
        If default_value if provided, then the attribute is considered as optional.
        If transform function is provided, the original attribute value is handled
        by transform function.

    excludes : list
        A list of excluded attributes that should `NOT` appear.
        Raise NotImplementedError if occurred.

    disables : list
        A list of attributes that is disabled in relay. Log warnings.

    ignores : list
        A list of attributes that is ignored in relay. Debug level logging.

    extras : dict
        A series of additional attributes should be added anyway to the returned
        attribute dict.

    custom_check : callable
        A custom function takes attribute, and return True/False.
        Raise RuntimeError if not bool(True) returned.
    """

    def __init__(
        self,
        op_name,
        transforms=None,
        excludes=None,
        disables=None,
        ignores=None,
        extras=None,
        custom_check=None,
    ):
        self._op_name = op_name
        self._transforms = transforms if transforms else {}
        self._excludes = excludes if excludes else []
        self._disables = disables if disables else []
        self._ignores = ignores if ignores else []
        self._extras = extras if extras else {}
        self._custom_check = custom_check

    def __call__(self, inputs, attrs, *args):
        self._ignores.append("_output_shapes")
        self._ignores.append("_input_shapes")
        self._ignores.append("T")
        self._ignores.append("use_cudnn_on_gpu")
        self._ignores.append("_node_name")
        self._ignores.append("is_training")
        self._ignores.append("_target_layout")

        # apply custom check
        if self._custom_check:
            func, msg = self._custom_check
            if not func(attrs):
                raise RuntimeError("Check failed: {}".format(msg))
        # get new op_name
        if isinstance(self._op_name, str):
            op_name = self._op_name
        else:
            assert callable(self._op_name), "op_name can either be string or callable"
            op_name = self._op_name(attrs)

        # ignore 'tvm_custom' always
        self._ignores.append("tvm_custom")

        # convert attributes
        new_attrs = {}
        for k in attrs.keys():
            if k in self._excludes:
                raise NotImplementedError(
                    "Attribute %s in operator %s is not" + " supported.", k, op_name
                )
            if k in self._disables:
                logger.debug("Attribute %s is disabled in relay.sym.%s", k, op_name)
            elif k in self._ignores:
                if k != "tvm_custom":
                    logger.debug("Attribute %s is ignored in relay.sym.%s", k, op_name)
            elif k in self._transforms:
                new_name, defaults, transform = self._parse_default(self._transforms[k])
                if defaults is None:
                    new_attr = self._required_attr(attrs, k)
                else:
                    new_attr = attrs.get(k, None)
                if new_attr is None:
                    new_attrs[new_name] = defaults
                else:
                    new_attrs[new_name] = transform(new_attr)
            else:
                # copy
                new_attrs[k] = attrs[k]
        # add extras
        new_attrs.update(self._extras)
        return get_relay_op(op_name)(*inputs, **new_attrs)

这个类核心就是调用了AttrCvt函数,完成了ONNX卷积算子到Relay 卷积算子的转换。这个转换包含了属性的转换以及根据layout对weights,inputs,outputs进行重排并返回一个Relay 卷积算子。(在tensorflow中倒是看到了对应代码的描述,在onnx模型转换中并没有看到类似的代码)
AttrCvt的调用位于python/tvm/relay/frontend/common.py文件中,根据类注释可知,这个类主要是实现了算子转换,即根据输入的op_name映射到relay的算子。具体过程是:先对传入的attrs进行检查,如有非法属性就报错,如果属性有相应的转换策略就直接转换(即上述代码中的transform),最后调用get_relay_op返回一个TVM Relay卷积算子。

get_relay_op函数的实现如下:

def get_relay_op(op_name):
    """Get the callable function from Relay based on operator name.
    Parameters
    ----------
    op_name : str
        The Relay operator name.
    """
    if "." in op_name:
        # explicit hierarchical modules
        op = _op
        try:
            for opn in op_name.split("."):
                op = getattr(op, opn)
        except AttributeError:
            op = None
    else:
        # try search op in various modules
        for candidate in (_op, _op.nn, _op.image, _op.vision, _op.contrib):
            op = getattr(candidate, op_name, None)
            if op is not None:
                break
    if not op:
        raise tvm.error.OpNotImplemented("Unable to map op_name {} to relay".format(op_name))
    return op

所有的op都位于python/tvm/relay/op包中,conv在op/nn中定义
在上述的for循环中,在python/tvm/relay/op/下搜索满足op name为op_name的Relay算子,找到就返回。
至于为什么要分两种情况,这是既支持用户写module.xxx也支持直接写xxx,这里的module可以是python/tvm/relay/op包中的任何一级文件夹比如nn。
nn.py中conv2d的实现如下:


def conv2d(
    data,
    weight,
    strides=(1, 1),
    padding=(0, 0),
    dilation=(1, 1),
    groups=1,
    channels=None,
    kernel_size=None,
    data_layout="NCHW",
    kernel_layout="OIHW",
    out_layout="",
    out_dtype="",
):
    r"""2D convolution.

    This operator takes the weight as the convolution kernel
    and convolves it with data to produce an output.


    In the default case, where the data_layout is `NCHW`
    and kernel_layout is `OIHW`, conv2d takes in
    a data Tensor with shape `(batch_size, in_channels, height, width)`,
    and a weight Tensor with shape `(channels, in_channels, kernel_size[0], kernel_size[1])`
    to produce an output Tensor with the following rule:

    .. math::

        \mbox{out}[b, c, y, x] = \sum_{dy, dx, k}
           \mbox{data}[b, k, \mbox{strides}[0] * y  + dy, \mbox{strides}[1] * x + dx] *
           \mbox{weight}[c, k, dy, dx]

    Padding and dilation are applied to data and weight respectively before the computation.
    This operator accepts data layout specification.
    Semantically, the operator will convert the layout to the canonical layout
    (`NCHW` for data and `OIHW` for weight), perform the computation,
    then convert to the out_layout.


    Parameters
    ----------
    data : tvm.relay.Expr
        The input data to the operator.

    weight : tvm.relay.Expr
        The weight expressions.

    strides : Optional[int, Tuple[int]]
        The strides of convolution.

    padding : Optional[int, Tuple[int]]
        The padding of convolution on both sides of inputs before convolution.

    dilation : Optional[int, Tuple[int]]
        Specifies the dilation rate to be used for dilated convolution.

    groups : Optional[int]
        Number of groups for grouped convolution.

    channels : Optional[int]
        Number of output channels of this convolution.

    kernel_size : Optional[int, Tuple[int]]
        The spatial of the convolution kernel.

    data_layout : Optional[str]
        Layout of the input.

    kernel_layout : Optional[str]
        Layout of the weight.

    out_layout : Optional[str]
        Layout of the output, by default, out_layout is the same as data_layout

    out_dtype : Optional[str]
        Specifies the output data type for mixed precision conv2d.

    Returns
    -------
    result : tvm.relay.Expr
        The computed result.
    """
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size)
    if isinstance(strides, int):
        strides = (strides, strides)
    if isinstance(dilation, int):
        dilation = (dilation, dilation)
    # TODO enforce 4-way padding in topi/nn/conv2d after #4644 merged
    # convert 2-way padding to 4-way padding
    padding = get_pad_tuple2d(padding)
    return _make.conv2d(
        data,
        weight,
        strides,
        padding,
        dilation,
        groups,
        channels,
        kernel_size,
        data_layout,
        kernel_layout,
        out_layout,
        out_dtype,
    )

调用关系:conv2d() ->_make.conv2d(),在_make.py中实现了C++类到python类的接口暴露

import tvm._ffi
tvm._ffi._init_api("relay.op.nn._make", __name__)

conv2d的注册代码如下:


TVM_REGISTER_GLOBAL("relay.op.nn._make.conv2d")
    .set_body_typed([](Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
                       Array<IndexExpr> dilation, int groups, IndexExpr channels,
                       Array<IndexExpr> kernel_size, String data_layout, String kernel_layout,
                       String out_layout, DataType out_dtype) {
      return MakeConv<Conv2DAttrs>(data, weight, strides, padding, dilation, groups, channels,
                                   kernel_size, data_layout, kernel_layout, out_layout, out_dtype,
                                   "nn.conv2d");
    });

RELAY_REGISTER_OP("nn.conv2d")
    .describe(R"code(2D convolution layer (e.g. spatial convolution over images).

This layer creates a convolution kernel that is convolved
with the layer input to produce a tensor of outputs.

- **data**: This depends on the `layout` parameter. Input is 4D array of shape
            (batch_size, in_channels, height, width) if `layout` is `NCHW`.
- **weight**: (channels, in_channels, kernel_size[0], kernel_size[1])
- **out**:  This depends on the `layout` parameter. Output is 4D array of shape
            (batch_size, channels, out_height, out_width) if `layout` is `NCHW`.

)code" TVM_ADD_FILELINE)
    .set_attrs_type<Conv2DAttrs>()
    .set_num_inputs(2)
    .add_argument("data", "Tensor", "The input tensor.")
    .add_argument("weight", "Tensor", "The weight tensor.")
    .set_support_level(2)
    .add_type_rel("Conv2D", Conv2DRel)
    .set_attr<FInferCorrectLayout>("FInferCorrectLayout", ConvInferCorrectLayout<Conv2DAttrs>)
    .set_attr<TOpPattern>("TOpPattern", kOutEWiseFusable);

TVM_REGISTER_GLOBAL这个宏定义将算子注册到一个全局对象中。可以看一下这个宏定义:

#define TVM_REGISTER_GLOBAL(OpName) \
  TVM_STR_CONCAT(TVM_FUNC_REG_VAR_DEF, __COUNTER__) = ::tvm::runtime::Registry::Register(OpName)

可以看到注册的实现在Registry类中,这个类有一个Register成员函数,这个函数会通过全局manager来将算子注册进去:

Registry& Registry::Register(const std::string& name, bool can_override) {  // NOLINT(*)
  Manager* m = Manager::Global();
  std::lock_guard<std::mutex> lock(m->mutex);
  if (m->fmap.count(name)) {
    ICHECK(can_override) << "Global PackedFunc " << name << " is already registered";
  }

  Registry* r = new Registry();
  r->name_ = name;
  m->fmap[name] = r;
  return *r;
}

其中set_body将通过MakeConv构建一个conv算子,然后注册到registry中。在MakeConv中,首先根据传入的conv参数,包括strides,kernel,layout等,构建atrrs对象,然后根据op的名字从已经注册过的conv算子中得到conv的算子,然后再将attrs和op一起打包到call类中。即在tvm/src/relay/op/nn/convolution_make.h中的:


template <typename T>
inline Expr MakeConv(Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
                     Array<IndexExpr> dilation, int groups, IndexExpr channels,
                     Array<IndexExpr> kernel_size, std::string data_layout,
                     std::string kernel_layout, std::string out_layout, DataType out_dtype,
                     std::string op_name) {
  auto attrs = make_object<T>();
  attrs->strides = std::move(strides);
  attrs->padding = std::move(padding);
  attrs->dilation = std::move(dilation);
  attrs->groups = groups;
  attrs->channels = std::move(channels);
  attrs->kernel_size = std::move(kernel_size);
  attrs->data_layout = std::move(data_layout);
  attrs->kernel_layout = std::move(kernel_layout);
  attrs->out_layout = std::move(out_layout);
  attrs->out_dtype = std::move(out_dtype);
  const Op& op = Op::Get(op_name);
  return Call(op, {data, weight}, Attrs(attrs), {});
}

Call是继承了Expr类:

class Call : public Expr {
 public:
  /*!
   * \brief The destructor
   */
  ~Call();

  /*!
   * \brief The constructor
   * \param op The operator will be invoked.
   * \param args The arguments of the call.
   * \param attrs The attributes of the call node.
   * \param type_args The type arguments passed to a polymorphic function.
   * \param span The source span of the expression.
   */
  TVM_DLL Call(Expr op, Array<Expr> args, Attrs attrs = Attrs(),
               Array<Type> type_args = Array<Type>(), Span span = Span());

  TVM_DEFINE_OBJECT_REF_METHODS(Call, RelayExpr, CallNode);
  TVM_DEFINE_OBJECT_REF_COW_METHOD(CallNode);
};

Op算子是通过RELAY_REGISTER_OP注册到一个公共AttrRegistry中的。
在一个op类中实际上并没有包含这个op的计算过程,只是纳入了这个算子的输入输出以及属性的信息。

特别注意Relay OP并没有包含具体的计算过程!上面的一系列操作仅仅是拿到了Relay 卷积OP的IR以及输入和属性。那么这个OP的计算过程是在哪里完成的呢?TOPI

定义算子的compute函数

算子的compute函数是算子的计算过程实现。
nn.conv2d的算子算法实现入口为python/tvm/topi/nn/conv2d.py中定义的conv2d函数,调用了同文件中的conv接口,在该接口中实现了compute函数并调用:

def conv(
    inp: te.Tensor,
    filt: te.Tensor,
    stride: Union[int, Sequence[int]],
    padding: Union[int, Sequence[int]],
    dilation: Union[int, Sequence[int]],
    groups: int,
    order: str,
    out_dtype: Union[str, None] = None,
):
    
    ...
 
 
    def compute(*args):
        nn, ff, *dim_indices = list(np.array(args)[permutation_to])
        return te.sum(
            temp.__getitem__(
                tuple(
                    np.array(
                        [nn, ff // (num_filter // groups) * (in_channel // groups) + rc]
                        + [
                            di * stride + r * dil
                            for di, stride, r, dil in zip(dim_indices, strides, rs, dilations)
                        ]
                    )[permutation_from]
                )
            ).astype(out_dtype)
            * filt.__getitem__(tuple(np.array([ff, rc] + rs)[permutation_from_kernel])).astype(
                out_dtype
            ),
            # Schedules depend on reduction axes being in the same order as the
            # layout, so we reorder here.
            axis=np.array([rc, *rs])[permutation_from_reductions].tolist(),
        )
 
    return te.compute(
        list(np.array([batch, out_channel] + out_dimensions)[permutation_from]),
        compute,
        # tag is expected to be lowercase
        tag=f"{'group_' if groups > 1 else ''}conv{dim}d_{order.lower()}",
        name=f"{'group_' if groups > 1 else ''}conv{dim}d_{order.lower()}",
    )

这里只是conv2d的默认compute,根据参数和输入数据的排布格式,在 python/tvm/topi/nn/conv2d.py中定义了对应的compute函数。

注册算子的compute函数和schedule

在实现算子的compute函数后,需要将这个compute函数加入relay算子中
在TVM中,这意味着我们不仅仅只是实现计算方法,还要给出对应的调度schedule策略,也就是为compute挑选合适的schedule。例如,当2d卷积是一个分组卷积时,我们会给它分配合适的计算方法和调度。conv2d的shedule定义在python/tvm/topi/generic/nn.py中,以schedule_conv2d_开头的函数定义了各种数据排布格式对应的调度策略,大部分都是使用了默认的调度方法
conv2dstrategy函数conv2d_strategy定义在python/tvm/relay/op/strategy/generic.py中。在该函数中,根据输入数据和卷积核的排布格式,给出各种排布组合的计算方法和调度。compute和schedule的组合即strategy。
这样,relay中就已经增加了我们的算子,便可以通过Relay Call Node来调用它。这一步我们要写一个接口,将参数传入算子,然后返回一个Relay Call Node。这个Node可以加入Relay的语法树。

不支持直接调用 Attrs和参数,所以这里用Op::Get从算子注册表中获取算子信息,作为参数传递给Call Nodenn.conv2d的Relay Call Node生成函数(src/relay/op/nn/convolution_make.h),即上面的MakeConv函数

当模板参数为Conv2DAttrs的时候,即生成的nn.conv2d的Relay Call Node。这里先是new了一个Conv2DAttrs,接收传入的各参数和属性;然后获取2d卷积注册信息,一并传给Call;最后返回CallNode类型实例的引用

在定义Relay Call Node函数后,我们要向Python注册一个接口来调用这个函数。这里注册是使用TVM_REGISTER_GLOBAL宏。注册后,在Python中就可以用relay.op._make.xxx(...)形式调用了。nn.conv2d的注册,该注册在本文开头

参考:https://zhuanlan.zhihu.com/p/368940120
https://blog.csdn.net/zx_ros/article/details/123526147

标签:kernel,layout,name,._,TVM,算子,解析,data,op
来源: https://www.cnblogs.com/whiteBear/p/16558215.html