TVM:解析TVM算子
作者:互联网
在对[TVM:编译流程]一文中,从ONNX模型中读取模型并转换为relay IR,其中调用_convert_operator
函数关于将onnx算子转换成Relay算子,其中如何实现当时直接跳过去了,本节将以卷积算子为例,看下Relay
表达式是如何转换为TOPI
算子并结合TVM的scheduler在后端上运行的
Relay卷积算子的转换过程
卷积算子转换过程如下:
"Conv": Conv.get_converter(opset),
中get_converter()的实现如下:
class OnnxOpConverter(object):
"""A helper class for holding onnx op converters."""
@classmethod
def get_converter(cls, opset):
"""Get converter matches given opset.
Parameters
----------
opset: int
opset from model.
Returns
-------
converter, which should be `_impl_vx`. Number x is the biggest
number smaller than or equal to opset belongs to all support versions.
"""
versions = [int(d.replace("_impl_v", "")) for d in dir(cls) if "_impl_v" in d]
versions = sorted(versions + [opset])
version = versions[max([i for i, v in enumerate(versions) if v == opset]) - 1]
if hasattr(cls, "_impl_v{}".format(version)):
return getattr(cls, "_impl_v{}".format(version))
raise NotImplementedError(
"opset version {} of {} not implemented".format(version, cls.__name__)
)
他是调用了父类的方法get_converter
,再通过getattr()
调用Conv类中的_impl_v1
实现,如下:
class Conv(OnnxOpConverter):
"""Operator converter for Conv."""
@classmethod
def _impl_v1(cls, inputs, attr, params):
# Use shape of input to determine convolution type.
data = inputs[0]
kernel = inputs[1]
input_shape = infer_shape(data)
ndim = len(input_shape)
'处理ONNX的卷积算子属性与TVM Relay的卷积OP属性不一致的问题'
kernel_type = infer_type(inputs[1])
kernel_shapes = [get_const_tuple(kernel_type.checked_type.shape)]
if "kernel_shape" not in attr:
attr["kernel_shape"] = kernel_shapes[0][2:]
if "auto_pad" in attr:
attr["auto_pad"] = attr["auto_pad"].decode("utf-8")
if attr["auto_pad"] in ("SAME_UPPER", "SAME_LOWER"):
# Warning: Convolution does not yet support dynamic shapes,
# one will need to run dynamic_to_static on this model after import
data = autopad(
data,
attr.get("strides", [1] * (ndim - 2)),
attr["kernel_shape"],
attr.get("dilations", [1] * (ndim - 2)),
mode=attr["auto_pad"],
)
elif attr["auto_pad"] == "VALID":
attr["pads"] = [0 for i in range(ndim - 2)]
elif attr["auto_pad"] == "NOTSET":
pass
else:
msg = 'Value {} in attribute "auto_pad" of operator Conv is invalid.'
raise tvm.error.OpAttributeInvalid(msg.format(attr["auto_pad"]))
attr.pop("auto_pad")
attr["channels"] = kernel_shapes[0][0]
`完成属性的转换以及OP转换`
out = AttrCvt(
op_name=dimension_picker("conv"),
transforms={
"kernel_shape": "kernel_size",
"dilations": ("dilation", 1),
"pads": ("padding", 0),
"group": ("groups", 1),
},
custom_check=dimension_constraint(),
)([data, kernel], attr, params)
use_bias = len(inputs) == 3
if use_bias:
out = _op.nn.bias_add(out, inputs[2])
return out
以及AttrCvt函数的实现:
class AttrCvt(object):
"""Common attribute converter. An AttrConverter instance is a callable:
```
attr_converter = AttrConverter(op_name, transforms={'a':'b', 'c':('d', 1)})
new_op_name, new_attr = attr_converter(attrs)
```
Parameters
----------
op_name : str or callable
If set as str, returned operator name is the str.
If set as callable, returned operator is the str returned by calling:
`op_name = func(attr)`
transforms : dict of `new_name, or (new_name, default_value, transform function)`
If only a new_name is provided, it's like renaming the attribute name.
If default_value if provided, then the attribute is considered as optional.
If transform function is provided, the original attribute value is handled
by transform function.
excludes : list
A list of excluded attributes that should `NOT` appear.
Raise NotImplementedError if occurred.
disables : list
A list of attributes that is disabled in relay. Log warnings.
ignores : list
A list of attributes that is ignored in relay. Debug level logging.
extras : dict
A series of additional attributes should be added anyway to the returned
attribute dict.
custom_check : callable
A custom function takes attribute, and return True/False.
Raise RuntimeError if not bool(True) returned.
"""
def __init__(
self,
op_name,
transforms=None,
excludes=None,
disables=None,
ignores=None,
extras=None,
custom_check=None,
):
self._op_name = op_name
self._transforms = transforms if transforms else {}
self._excludes = excludes if excludes else []
self._disables = disables if disables else []
self._ignores = ignores if ignores else []
self._extras = extras if extras else {}
self._custom_check = custom_check
def __call__(self, inputs, attrs, *args):
self._ignores.append("_output_shapes")
self._ignores.append("_input_shapes")
self._ignores.append("T")
self._ignores.append("use_cudnn_on_gpu")
self._ignores.append("_node_name")
self._ignores.append("is_training")
self._ignores.append("_target_layout")
# apply custom check
if self._custom_check:
func, msg = self._custom_check
if not func(attrs):
raise RuntimeError("Check failed: {}".format(msg))
# get new op_name
if isinstance(self._op_name, str):
op_name = self._op_name
else:
assert callable(self._op_name), "op_name can either be string or callable"
op_name = self._op_name(attrs)
# ignore 'tvm_custom' always
self._ignores.append("tvm_custom")
# convert attributes
new_attrs = {}
for k in attrs.keys():
if k in self._excludes:
raise NotImplementedError(
"Attribute %s in operator %s is not" + " supported.", k, op_name
)
if k in self._disables:
logger.debug("Attribute %s is disabled in relay.sym.%s", k, op_name)
elif k in self._ignores:
if k != "tvm_custom":
logger.debug("Attribute %s is ignored in relay.sym.%s", k, op_name)
elif k in self._transforms:
new_name, defaults, transform = self._parse_default(self._transforms[k])
if defaults is None:
new_attr = self._required_attr(attrs, k)
else:
new_attr = attrs.get(k, None)
if new_attr is None:
new_attrs[new_name] = defaults
else:
new_attrs[new_name] = transform(new_attr)
else:
# copy
new_attrs[k] = attrs[k]
# add extras
new_attrs.update(self._extras)
return get_relay_op(op_name)(*inputs, **new_attrs)
这个类核心就是调用了AttrCvt
函数,完成了ONNX卷积算子到Relay 卷积算子的转换。这个转换包含了属性的转换以及根据layout对weights,inputs,outputs进行重排并返回一个Relay 卷积算子。(在tensorflow中倒是看到了对应代码的描述,在onnx模型转换中并没有看到类似的代码)
AttrCvt
的调用位于python/tvm/relay/frontend/common.py文件中,根据类注释可知,这个类主要是实现了算子转换,即根据输入的op_name
映射到relay的算子。具体过程是:先对传入的attrs进行检查,如有非法属性就报错,如果属性有相应的转换策略就直接转换(即上述代码中的transform),最后调用get_relay_op返回一个TVM Relay卷积算子。
get_relay_op函数
的实现如下:
def get_relay_op(op_name):
"""Get the callable function from Relay based on operator name.
Parameters
----------
op_name : str
The Relay operator name.
"""
if "." in op_name:
# explicit hierarchical modules
op = _op
try:
for opn in op_name.split("."):
op = getattr(op, opn)
except AttributeError:
op = None
else:
# try search op in various modules
for candidate in (_op, _op.nn, _op.image, _op.vision, _op.contrib):
op = getattr(candidate, op_name, None)
if op is not None:
break
if not op:
raise tvm.error.OpNotImplemented("Unable to map op_name {} to relay".format(op_name))
return op
所有的op都位于python/tvm/relay/op包中,conv在op/nn中定义
在上述的for循环中,在python/tvm/relay/op/下搜索满足op name为op_name的Relay算子,找到就返回。
至于为什么要分两种情况,这是既支持用户写module.xxx也支持直接写xxx,这里的module可以是python/tvm/relay/op包中的任何一级文件夹比如nn。
nn.py中conv2d的实现如下:
def conv2d(
data,
weight,
strides=(1, 1),
padding=(0, 0),
dilation=(1, 1),
groups=1,
channels=None,
kernel_size=None,
data_layout="NCHW",
kernel_layout="OIHW",
out_layout="",
out_dtype="",
):
r"""2D convolution.
This operator takes the weight as the convolution kernel
and convolves it with data to produce an output.
In the default case, where the data_layout is `NCHW`
and kernel_layout is `OIHW`, conv2d takes in
a data Tensor with shape `(batch_size, in_channels, height, width)`,
and a weight Tensor with shape `(channels, in_channels, kernel_size[0], kernel_size[1])`
to produce an output Tensor with the following rule:
.. math::
\mbox{out}[b, c, y, x] = \sum_{dy, dx, k}
\mbox{data}[b, k, \mbox{strides}[0] * y + dy, \mbox{strides}[1] * x + dx] *
\mbox{weight}[c, k, dy, dx]
Padding and dilation are applied to data and weight respectively before the computation.
This operator accepts data layout specification.
Semantically, the operator will convert the layout to the canonical layout
(`NCHW` for data and `OIHW` for weight), perform the computation,
then convert to the out_layout.
Parameters
----------
data : tvm.relay.Expr
The input data to the operator.
weight : tvm.relay.Expr
The weight expressions.
strides : Optional[int, Tuple[int]]
The strides of convolution.
padding : Optional[int, Tuple[int]]
The padding of convolution on both sides of inputs before convolution.
dilation : Optional[int, Tuple[int]]
Specifies the dilation rate to be used for dilated convolution.
groups : Optional[int]
Number of groups for grouped convolution.
channels : Optional[int]
Number of output channels of this convolution.
kernel_size : Optional[int, Tuple[int]]
The spatial of the convolution kernel.
data_layout : Optional[str]
Layout of the input.
kernel_layout : Optional[str]
Layout of the weight.
out_layout : Optional[str]
Layout of the output, by default, out_layout is the same as data_layout
out_dtype : Optional[str]
Specifies the output data type for mixed precision conv2d.
Returns
-------
result : tvm.relay.Expr
The computed result.
"""
if isinstance(kernel_size, int):
kernel_size = (kernel_size, kernel_size)
if isinstance(strides, int):
strides = (strides, strides)
if isinstance(dilation, int):
dilation = (dilation, dilation)
# TODO enforce 4-way padding in topi/nn/conv2d after #4644 merged
# convert 2-way padding to 4-way padding
padding = get_pad_tuple2d(padding)
return _make.conv2d(
data,
weight,
strides,
padding,
dilation,
groups,
channels,
kernel_size,
data_layout,
kernel_layout,
out_layout,
out_dtype,
)
调用关系:conv2d() ->_make.conv2d(),在_make.py中实现了C++类到python类的接口暴露
import tvm._ffi
tvm._ffi._init_api("relay.op.nn._make", __name__)
conv2d的注册代码如下:
TVM_REGISTER_GLOBAL("relay.op.nn._make.conv2d")
.set_body_typed([](Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
Array<IndexExpr> dilation, int groups, IndexExpr channels,
Array<IndexExpr> kernel_size, String data_layout, String kernel_layout,
String out_layout, DataType out_dtype) {
return MakeConv<Conv2DAttrs>(data, weight, strides, padding, dilation, groups, channels,
kernel_size, data_layout, kernel_layout, out_layout, out_dtype,
"nn.conv2d");
});
RELAY_REGISTER_OP("nn.conv2d")
.describe(R"code(2D convolution layer (e.g. spatial convolution over images).
This layer creates a convolution kernel that is convolved
with the layer input to produce a tensor of outputs.
- **data**: This depends on the `layout` parameter. Input is 4D array of shape
(batch_size, in_channels, height, width) if `layout` is `NCHW`.
- **weight**: (channels, in_channels, kernel_size[0], kernel_size[1])
- **out**: This depends on the `layout` parameter. Output is 4D array of shape
(batch_size, channels, out_height, out_width) if `layout` is `NCHW`.
)code" TVM_ADD_FILELINE)
.set_attrs_type<Conv2DAttrs>()
.set_num_inputs(2)
.add_argument("data", "Tensor", "The input tensor.")
.add_argument("weight", "Tensor", "The weight tensor.")
.set_support_level(2)
.add_type_rel("Conv2D", Conv2DRel)
.set_attr<FInferCorrectLayout>("FInferCorrectLayout", ConvInferCorrectLayout<Conv2DAttrs>)
.set_attr<TOpPattern>("TOpPattern", kOutEWiseFusable);
TVM_REGISTER_GLOBAL
这个宏定义将算子注册到一个全局对象
中。可以看一下这个宏定义:
#define TVM_REGISTER_GLOBAL(OpName) \
TVM_STR_CONCAT(TVM_FUNC_REG_VAR_DEF, __COUNTER__) = ::tvm::runtime::Registry::Register(OpName)
可以看到注册的实现在Registry类中,这个类有一个Register成员函数,这个函数会通过全局manager来将算子注册进去:
Registry& Registry::Register(const std::string& name, bool can_override) { // NOLINT(*)
Manager* m = Manager::Global();
std::lock_guard<std::mutex> lock(m->mutex);
if (m->fmap.count(name)) {
ICHECK(can_override) << "Global PackedFunc " << name << " is already registered";
}
Registry* r = new Registry();
r->name_ = name;
m->fmap[name] = r;
return *r;
}
其中set_body将通过MakeConv构建一个conv算子,然后注册到registry中。在MakeConv中,首先根据传入的conv参数,包括strides,kernel,layout等,构建atrrs对象,然后根据op的名字从已经注册过的conv算子中得到conv的算子,然后再将attrs和op一起打包到call类中。即在tvm/src/relay/op/nn/convolution_make.h中的:
template <typename T>
inline Expr MakeConv(Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
Array<IndexExpr> dilation, int groups, IndexExpr channels,
Array<IndexExpr> kernel_size, std::string data_layout,
std::string kernel_layout, std::string out_layout, DataType out_dtype,
std::string op_name) {
auto attrs = make_object<T>();
attrs->strides = std::move(strides);
attrs->padding = std::move(padding);
attrs->dilation = std::move(dilation);
attrs->groups = groups;
attrs->channels = std::move(channels);
attrs->kernel_size = std::move(kernel_size);
attrs->data_layout = std::move(data_layout);
attrs->kernel_layout = std::move(kernel_layout);
attrs->out_layout = std::move(out_layout);
attrs->out_dtype = std::move(out_dtype);
const Op& op = Op::Get(op_name);
return Call(op, {data, weight}, Attrs(attrs), {});
}
Call是继承了Expr类:
class Call : public Expr {
public:
/*!
* \brief The destructor
*/
~Call();
/*!
* \brief The constructor
* \param op The operator will be invoked.
* \param args The arguments of the call.
* \param attrs The attributes of the call node.
* \param type_args The type arguments passed to a polymorphic function.
* \param span The source span of the expression.
*/
TVM_DLL Call(Expr op, Array<Expr> args, Attrs attrs = Attrs(),
Array<Type> type_args = Array<Type>(), Span span = Span());
TVM_DEFINE_OBJECT_REF_METHODS(Call, RelayExpr, CallNode);
TVM_DEFINE_OBJECT_REF_COW_METHOD(CallNode);
};
Op算子是通过RELAY_REGISTER_OP注册到一个公共AttrRegistry中的。
在一个op类中实际上并没有包含这个op的计算过程,只是纳入了这个算子的输入输出以及属性的信息。
特别注意Relay OP并没有包含具体的计算过程!上面的一系列操作仅仅是拿到了Relay 卷积OP的IR以及输入和属性。那么这个OP的计算过程是在哪里完成的呢?TOPI
定义算子的compute函数
算子的compute函数是算子的计算过程实现。
nn.conv2d的算子算法实现入口为python/tvm/topi/nn/conv2d.py中定义的conv2d函数,调用了同文件中的conv接口,在该接口中实现了compute函数并调用:
def conv(
inp: te.Tensor,
filt: te.Tensor,
stride: Union[int, Sequence[int]],
padding: Union[int, Sequence[int]],
dilation: Union[int, Sequence[int]],
groups: int,
order: str,
out_dtype: Union[str, None] = None,
):
...
def compute(*args):
nn, ff, *dim_indices = list(np.array(args)[permutation_to])
return te.sum(
temp.__getitem__(
tuple(
np.array(
[nn, ff // (num_filter // groups) * (in_channel // groups) + rc]
+ [
di * stride + r * dil
for di, stride, r, dil in zip(dim_indices, strides, rs, dilations)
]
)[permutation_from]
)
).astype(out_dtype)
* filt.__getitem__(tuple(np.array([ff, rc] + rs)[permutation_from_kernel])).astype(
out_dtype
),
# Schedules depend on reduction axes being in the same order as the
# layout, so we reorder here.
axis=np.array([rc, *rs])[permutation_from_reductions].tolist(),
)
return te.compute(
list(np.array([batch, out_channel] + out_dimensions)[permutation_from]),
compute,
# tag is expected to be lowercase
tag=f"{'group_' if groups > 1 else ''}conv{dim}d_{order.lower()}",
name=f"{'group_' if groups > 1 else ''}conv{dim}d_{order.lower()}",
)
这里只是conv2d的默认compute,根据参数和输入数据的排布格式,在 python/tvm/topi/nn/conv2d.py中定义了对应的compute函数。
注册算子的compute函数和schedule
在实现算子的compute函数后,需要将这个compute函数加入relay算子中
在TVM中,这意味着我们不仅仅只是实现计算方法,还要给出对应的调度schedule策略,也就是为compute挑选合适的schedule。例如,当2d卷积是一个分组卷积时,我们会给它分配合适的计算方法和调度。conv2d的shedule
定义在python/tvm/topi/generic/nn.py中,以schedule_conv2d_开头的函数
定义了各种数据排布格式对应的调度策略,大部分都是使用了默认的调度方法
conv2d
的strategy函数conv2d_strategy
定义在python/tvm/relay/op/strategy/generic.py中。在该函数中,根据输入数据和卷积核的排布格式,给出各种排布组合的计算方法和调度。compute和schedule的组合即strategy。
这样,relay中就已经增加了我们的算子,便可以通过Relay Call Node
来调用它。这一步我们要写一个接口,将参数传入算子,然后返回一个Relay Call Node。这个Node可以加入Relay的语法树。
不支持直接调用 Attrs和参数,所以这里用Op::Get从算子注册表中获取算子信息,作为参数传递给Call Node
。nn.conv2d的Relay Call Node生成函数(src/relay/op/nn/convolution_make.h),即上面的MakeConv函数
当模板参数为Conv2DAttrs
的时候,即生成的nn.conv2d的Relay Call Node。这里先是new了一个Conv2DAttrs,接收传入的各参数和属性;然后获取2d卷积注册信息,一并传给Call;最后返回CallNode类型实例的引用
在定义Relay Call Node函数后,我们要向Python注册一个接口来调用这个函数。这里注册是使用TVM_REGISTER_GLOBAL宏。注册后,在Python中就可以用relay.op._make.xxx(...)形式调用了。nn.conv2d的注册,该注册在本文开头
参考:https://zhuanlan.zhihu.com/p/368940120
https://blog.csdn.net/zx_ros/article/details/123526147
标签:kernel,layout,name,._,TVM,算子,解析,data,op 来源: https://www.cnblogs.com/whiteBear/p/16558215.html