首页 > 其他分享> > 文献阅读_image caption_Knowledge-Based Systems2021_Reasoning like Humans: On Dynamic Attention Prior in

文献阅读_image caption_Knowledge-Based Systems2021_Reasoning like Humans: On Dynamic Attention Prior in

2021-09-29 22:32:06 作者：互联网

Reasoning like Humans: On Dynamic Attention Prior in Image Captioning

一言以蔽之：引入前一时序的注意力（ADP），引入整个句子作为输入（LLC），以基本相同的参数和算量，实现CIDER-D提升2.32%

Abstract & Conclusion

1. most conventional deep attention models perform attention operations for each block/step independently，which neglects prior knowledge obtained by previous steps.

2. we propose a novel method — DYnamic Attention PRior (DY-APR)， which Attention Distribution Prior+Local Linguistic Context→dynamic attention aggregation

Introduction

以往的注意力模型，多是独立针对block/step的，这导致了两个问题：

1.如果注意力是独立学习的（没有先验），则在全参数空间搜索的结果并不是很精准

2.需要大的数据集（ps：imgae net 沦为小数据集了）

我们发现‘高频共生词有更高的概率出现在同一个句子’（ps：应该是基于...的假设，熊猫和熊出现在一起的概率很高么？）提出了本地语境（语言先验）有助于词汇预测，又因为全局注意力机制易过平滑，我们又引入了基于前时序的先验。

注意力分配先验，ADP：

受仿生启发（逐步从一堆东西中找出感兴趣的而不是直接关注细节），由上层的注意力分布作为下层的归纳偏差（记为上层先验），上层先验和当前层注意力通过门机制一动态融合，门机制二用来平衡门机制一引入的噪音。如图

本地语境，LLC：

词嵌入向量由一组定长块组成（e.g. 512维向量=16组32维向量），第一个#C 块作为本地语境，由前一时序获得，记为 shift-through-time chunck ，再跟着一个由当前时序词嵌入线性转换获得的块。反向传播的时候带着本地语境一起更新，记为‘‘local–global attention’