其他分享
首页 > 其他分享> > 9_Transformer Model:Attention without RNN

9_Transformer Model:Attention without RNN

作者:互联网

文章目录

一、Transformer Model

image-20210404124828004

二、Attention for RNN

2.1 Attention for Seq2Seq Model

image-20210404162752739
image-20210404164355655

三、Attention without RNN(去掉RNN,只保留Attention)

Question: How to remove RNN while keeping attention?(Attention原本是用在RNN上,怎么样才能剥离RNN,只保留Attention)

3.1 Attention Layer

设计Attention层用于Seq2Seq模型,我们移除了RNN,现在搭建Attention。

image-20210404165627546

3.1.1 Compute weights和Compute context vector

image-20210404171443632

3.1.2 Output of attention layer:

image-20210404172017714

3.2 Attention Layer for Machine Translation

image-20210404192439115

四、Self-Attention without RNN

4.1 Self-Attention Layer

image-20210404193709104

4.1.1 Compute weights和Compute context vector

image-20210404195800530

4.1.2 Output of self-attention layer

image-20210404195937174

五、Summary(总结)

Reference:

  1. Bahdanau, Cho, & Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
  2. Cheng, Dong, & Lapata. Long Short-Term Memory-Networks for Machine Reading. In EMNLP, 2016.
  3. Vaswani et al. Attention Is All You Need. In NIPS, 2017.

5.1 Attention Layer

image-20210404200709404

5.2 Self-Attention Layer

image-20210404201039981

标签:Layer,attention,RNN,Attention,Transformer,Seq2Seq
来源: https://blog.csdn.net/brawly/article/details/122711022