POS标注器 OpenNLP
作者:互联网
5 POS标注器
功能介绍:语音标记器的部分标记符号与基于符号本身和符号的上下文中它们的相应字类型。符号可能取决于符号和上下文使用多个POS标签。该OpenNLP POS标注器使用的概率模型来预测正确的POS标记出了标签组。为了限制可能的标记的符号标记字典可以使用这增加了捉人者的标记和运行时性能。
API:部分的词类打标签训练API支持一个新的POS模式的培训。三个基本步骤是必要的训练它:
- 应用程序必须打开一个示例数据流
- 调用POSTagger.train方法
- 保存POSModel到文件或数据库
在E盘新建一个 myText.txt 文件,内容为
Hi. How are you? This is Mike.
代码实现1:
package package01;
import opennlp.tools.cmdline.PerformanceMonitor;
import opennlp.tools.cmdline.postag.POSModelLoader;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSSample;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import opennlp.tools.util.InputStreamFactory;
import opennlp.tools.util.MarkableFileInputStreamFactory;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
public class Test05 {
public static void main(String[] args) {
try {
Test05.POSTag();
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* 4.POS标注器:POS Tagger
* @deprecated Hi._NNP How_WRB are_VBP you?_JJ This_DT is_VBZ Mike._NNP
* @param str
*
* https://stackoverflow.com/questions/50668754/the-constructor-plaintextbylinestreamstringreader-is-undefined
*/
public static void POSTag() throws IOException {
POSModel model = new POSModelLoader().load(new File("E:\\NLP_Practics\\models\\en-pos-maxent.bin"));
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");//显示加载时间
POSTaggerME tagger = new POSTaggerME(model);
// String input = "Hi. How are you? This is Mike.";
// ObjectStream<String> lineStream = new PlainTextByLineStream(new StringReader(input));
Charset charset = Charset.forName("UTF-8");
InputStreamFactory isf = new MarkableFileInputStreamFactory(new File("E:\\myText.txt"));
ObjectStream<String> lineStream = new PlainTextByLineStream(isf, charset);
perfMon.start();
String line;
while ((line = lineStream.read()) != null) {
String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
String[] tags = tagger.tag(whitespaceTokenizerLine);
POSSample sample = new POSSample(whitespaceTokenizerLine, tags);
System.out.println(sample.toString());
perfMon.incrementCounter();
}
perfMon.stopAndPrintFinalResult();
System.out.println("--------------4-------------");
lineStream.close();
}
}
结果
Loading POS Tagger model ... done (0.566s) Hi._NNP How_WRB are_VBP you?_JJ This_DT is_VBZ Mike._NNP --------------4------------- Average: 125.0 sent/s Total: 1 sent Runtime: 0.008s
标签:String,POS,OpenNLP,opennlp,new,import,tools,标注 来源: https://www.cnblogs.com/yuyu666/p/15029748.html