首页 > 其他分享> > 9_Transformer Model：Attention without RNN

9_Transformer Model：Attention without RNN

2022-02-01 09:03:49 作者：互联网

一、Transformer Model

Transformer由Attention和self-Attention层组成
Transformer 模型完全基于Attention
Attention原本是用在RNN上的，这节课把RNN去掉，只保留Attention
Original paper: Vaswani et al. Attention Is All You Need. In NIPS, 2017.
Transformer is a Seq2Seq model.（Transformer是一种Seq2Seq模型，它有一个encoder和一个decoder，很适合做机器翻译）
Transformer is not RNN.（Transformer不是循环神经网络，Transformer没有循环的结构）
Purely based attention and dense layers.（Transformer只有Attention和全连接层）
Higher accuracy than RNNs on large datasets. （Transformer有更高的准确度）

Seq2Seq模型：有一个encoder和一个decoder，encoder的输入是m个向量（X₁ ，X₂ ，···，X_m ），encoder把这些输入的信息压缩到状态向量h中，最后一个状态h_m ，是对所有输入的概括。
decoder是一个文本生成器，依次生成状态S，然后根据状态S生成单词，把新生成的单词作为下一个输入。
如果用attention还需要计算向量C，每计算一个状态S，就计算一个向量C。

Question: How to remove RNN while keeping attention?（Attention原本是用在RNN上，怎么样才能剥离RNN，只保留Attention）

设计Attention层用于Seq2Seq模型，我们移除了RNN，现在搭建Attention。

Attention was originally developed for Seq2Seq RNN models [1].
Self-attention: attention for all the RNN models (not necessarily Seq2Seq models [2].
Attention can be used without RNN [3].
We learned how to build attention layer and self-attention layer.

Reference:

Bahdanau, Cho, & Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
Cheng, Dong, & Lapata. Long Short-Term Memory-Networks for Machine Reading. In EMNLP, 2016.
Vaswani et al. Attention Is All You Need. In NIPS, 2017.

标签：Layer,attention,RNN,Attention,Transformer,Seq2Seq
来源： https://blog.csdn.net/brawly/article/details/122711022