首页 > 其他分享> > NLP十大Baseline论文简述(七) - deep_nmt

NLP十大Baseline论文简述(七) - deep_nmt

2021-10-18 13:33:21 作者：互联网

文章目录

前言：
1. Paper：
2. BlEU介绍
3. 背景介绍
4. 论文摘要
5. 研究意义
6. 论文总结

前言：

如果需要对基础概念不了解，可以参考这里。我汇总了论文中涉及的大部分概念，以便更好的理解论文。

1. Paper：

Sequence to Sequence Learning with Neural Networks
使用神经网络来做序列到序列的学习

2. BlEU介绍

如何评价机器翻译结果的好坏

人工评价：通过人主观对翻译进行打分
优点：准确
缺点：速度慢，价格贵

机器自动评价：通过设置指标对机器翻译结果自动评价
优点：较为准确，速度快，免费
确认：可能和人工评价有一些出入

3. 背景介绍

深度神经网络非常成功，但是却很难处理序列到序列的问题。
本文使用一种新的Seq2Seq模型结果来解决序列到序列的问题，其中Seq2Seq模型的Encoder和Decoder都是用LSTM。
前人研究者针对这个问题已经有了很多工作，包括Seq2Seq模型和注意力机制
本文的深度Seq2Seq模型在机器翻译上取得了非常好的效果

4. 论文摘要

Deep Neural Networks (DNNs) are powerful models that have achieved excel-lent performance on difficult learning tasks. 深度神经网络(DNNs)是一种强大的模型，在困难的学习任务中取得了出色的表现。

Although DNNswork well wheneverlarge labeled training sets are available, they cannot be used to map sequences tosequences. 尽管DNNs work很好，只要有大的标记训练集可用，它们不能用于序列到序列的映射。

In this paper, we present a general end-to-end approach to sequencelearning that makes minimal assumptions on the sequence structure. 在本文中，我们提出了一个通用的端到端序列学习方法，该方法对序列结构做了最小的假设。

Our methoduses a multilayered Long Short-Term Memory (LSTM) to map theinput sequenceto a vector of a fixed dimensionality, and then another deep LSTM to decode thetarget sequence from the vector. 我们的方法使用多层长短期记忆(LSTM)将输入序列映射到一个固定维度的向量，然后使用另一个深度LSTM从向量中解码目标序列。

Our main result is that on anEnglish to Frenchtranslation task from the WMT’14 dataset, the translationsproduced by the LSTMachieve a BLEU score of 34.8 on the entire test set, where the LSTM’s BLEUscore was penalized on out-of-vocabulary words. 我们的主要结果是，在一个来自WMT ’ 14数据集的英法翻译任务中，LSTM产生的翻译在整个测试集中取得了34.8的BLEU分数，其中LSTM的BLEUscore在词汇外的单词上受到了惩罚。

Additionally, the LSTM did nothave difficulty on long sentences. 此外，LSTM对长句子没有困难。 For comparison, a phrase-based SMT systemachieves a BLEU score of 33.3 on the same dataset. 作为比较，基于短语的SMT系统在同一数据集上的BLEU得分为33.3。

When we usedthe LSTMto rerank the 1000 hypotheses produced by the aforementioned SMT system, itsBLEU score increases to 36.5, which is close to the previous best result on thistask. 当我们使用lstm对上述SMT系统产生的1000个假设进行重新排序时，它的bleu评分增加到36.5，接近于之前该任务的最佳结果。

The LSTM also learned sensible phrase and sentence representations thatare sensitive to word order and are relatively invariant to the active and the pas-sive voice. LSTM还学习了对语序敏感、对主动语态和被动语态相对不变的合理短语和句子表征。

Finally, we found that reversing the order of thewords in all sourcesentences (but not target sentences) improved the LSTM’s performance markedly,because doing so introduced many short term dependencies between the sourceand the target sentence which made the optimization problem easier
最后，我们发现在所有的源句(而不是目标句)中颠倒单词的顺序显著地提高了LSTM的性能，因为这样做引入了源句和目标句之间的许多短期依赖关系，使优化问题更容易

5. 研究意义

Deep NMT历史意义

提出的Deep NMT模型是transformer前最好的神经机器翻译模型
是Google翻译系统的基础

6. 论文总结

关键点：

验证了Seq2Seq模型对于序列到序列任务的有效性
从实验的角度发现了很多提高翻译效果的tricks
Deep NMT 模型

创新点：

提出了一种新的神经机器翻译模型 – Deep NMT模型
提出了一些提高神经机器翻译效果的tricks – 多层LSTM和倒序输入等
在WMT14英语到法语翻译上得到非常好的效果

启发点：

Seq2Seq模型就是使用一个LSTM提取输入序列的特征，每个时间步输入一个词，从而生成固定维度的句子向量表示，然后Decoder使用另外一个LSTM来从这个向量中生成输入序列
我们的实验也支持这个结论，我们的模型生成的句子表示能够明确词序信息，并且能够识别出来同一个含义的主动和被动语态。

标签：NLP,nmt,Baseline,模型,论文,Seq2Seq,机器翻译,序列,LSTM
来源： https://blog.csdn.net/landian0531/article/details/120777314