翻译Attention Is All You Need
作者:互联网
Attention Is All You Need
Abstract
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder.
显性序列转换模型基于复杂的递归或卷积神经网络,包括编码器和解码器。
The best performing models also connect the encoder and decoder through an attention mechanism.
性能最佳的模型还通过注意机制连接编码器和解码器。
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
我们提出了一种新的简单的网络结构,转换器,它完全基于注意机制,完全免除了递归和卷积。
Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
在两个机器翻译任务上的实验表明,这些模型在质量上更优,同时更可并行化,所需的训练时间明显更少。
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU.
我们的模型在2014年WMT英德翻译任务中达到28.4 BLEU,比现有的最佳结果(包括集成部分)提高了2个BLEU以上。
On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature.
在2014年的WMT英法翻译任务中,我们的模型在8个GPU上进行了3.5天的培训后,建立了一个新的单模型——最先进的BLEU评分41.8,这只是文献中最佳模型培训成本的一小部分。
We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
我们证明了转换器可以很好地将其推广到其他任务,并成功地将其应用到具有大量和有限训练数据的英语选民分析中。
标签:BLEU,翻译,models,模型,Attention,WMT,Need,2014,best 来源: https://www.cnblogs.com/wwj99/p/12156301.html