首页 > 其他分享> > ICLR2019:(Slimmable)SLIMMABLE NEURAL NETWORKS

ICLR2019:(Slimmable)SLIMMABLE NEURAL NETWORKS

2022-08-28 22:32:10 作者：互联网

Institute：University of Illinois at Urbana-Champaign
Author：Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang
GitHub：https://github.com/JiahuiYu/slimmable_networks

Introduction

　　(1)Different devices have drastically different runtimes for the same neural network.(相同网络在不同设备的运行时间不同)

　　(2)The availability of hardware resources on the same device even changes greatly over different times.(相同设备在不同时间的可用资源不同)

　　(3)In contrast to width (number of channels), reducing depth cannot reduce memory footprint in inference.(减少深度不能在推理阶段减少内存消耗，网络的计算图取决于宽度配置)

　　研究问题：给定资源预算，如何及时，自适应且有效地在网络运行时延和精度中达到平衡？对此提出了Slimmable neural network,优点有：

　　(1) For different conditions, a single model is trained, benchmarked and deployed.（不同条件只需训练一个网路）

　　(2) A near-optimal trade-off can be achieved by running the model on a target device and adjusting active channels accordingly.（通过调整激活通道来达到权衡）

　　(3) The solution is generally applicable to (normal, group, depthwise-separable, dilated) convolutions, fully-connected layers, pooling layers and many other building blocks of neural networks. It is also generally applicable to different tasks including classification, detection, identification, image restoration and more.（适用于多种结构，多种任务）

　　(4) In practice, it is straightforward to deploy on mobiles with existing runtime libraries.（部署简单）

RELATED WORK

　　Model Pruning and Distilling.通过大模型的Soft-targets和中间层表示来训练小模型

　　Adaptive Computation Graph.适应性地构建神经网络计算图

　　Conditional Normalization.

Method

1.TRAINING

　　Naive Training/Training slimmable neural network :

　　朴素训练共享Batch Normalization层，通过不同switch子网训练损失的不加权总和来计算模型的总损失。（积累所有switch子网的反向传播梯度，然后更新权重）但是朴素训练方式仅有0.1%的准确率。原因推断是：对单个通道，switch导致上一层不同数量的输入通道带来不同的均值和方差，影响共享BN层的处理。（特征聚合）

　　对比INCREMENTAL TRAINING：