其他分享
首页 > 其他分享> > 机器翻译——mosesdecoder

机器翻译——mosesdecoder

作者:互联网

1.moses

本文主要介绍 mosesdecoder 中的 tokenizer
github地址

2.安装及使用

2.1 安装

直接 clone 上面 github

git clone https://github.com/moses-smt/mosesdecoder.git

2.2 tokenizer 使用

进入tokenizer.perl所在目录

cd mosesdecoder/scripts/tokenizer/

tokenizer.perl 参数如下:

Usage ./tokenizer.perl (-l [en|de|...]) (-threads 4) < textfile > tokenizedfile
Options:
  -q     ... quiet.
  -a     ... aggressive hyphen splitting.
  -b     ... disable Perl buffering.
  -time  ... enable processing time calculation.
  -penn  ... use Penn treebank-like tokenization.
  -protected FILE  ... specify file with patters to be protected in tokenisation.
  -no-escape ... don't perform HTML escaping on apostrophy, quotes, etc.

tokenizer 主要将标点与词分开,具体可以查看tokenizer.perl
例如文件 input.en:

Are you sure you want to cancel the upgrade?
Enemy's march trail's color will turn blue (originally red)
Clicking "Change Appearance" will replace your custom avatar with a default avatar.

运行

perl ./tokenizer.perl -l en -no-escape <input.en> tokenizedfile.en

得到:

Are you sure you want to cancel the upgrade ?
Enemy 's march trail 's color will turn blue ( originally red )
Clicking " Change Appearance " will replace your custom avatar with a default avatar .

注意:

在这里插入图片描述

标签:...,en,tokenizer,机器翻译,perl,avatar,mosesdecoder
来源: https://blog.csdn.net/qq_40837206/article/details/121410594