python – 使用nltk和wordnet对多数名词进行解释
作者:互联网
我想用lemmatize
from nltk import word_tokenize, sent_tokenize, pos_tag
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
lmtzr = WordNetLemmatizer()
POS = pos_tag(text)
def get_wordnet_pos(treebank_tag):
#maps pos tag so lemmatizer understands
from nltk.corpus import wordnet
if treebank_tag.startswith('J'):
return wordnet.ADJ
elif treebank_tag.startswith('V'):
return wordnet.VERB
elif treebank_tag.startswith('N'):
return wordnet.NOUN
elif treebank_tag.startswith('R'):
return wordnet.ADV
else:
return wordnet.NOUN
lmtzr.lemmatize(text[i], get_wordnet_pos(POS[i][1]))
问题是POS标记器得到“procaspases”是’NNS’,但是如何将NNS转换为wordnet,因为即使在词形变换器之后“procaspases”仍然是“procaspaseS”.
解决方法:
NLTK负责大多数复数,而不仅仅是删除一个结尾.
import nltk
from nltk.stem.wordnet import WordNetLemmatizer
Lem = WordNetLemmatizer()
phrase = 'cobblers ants women boys needs finds binaries hobbies busses wolves'
words = phrase.split()
for word in words :
lemword = Lem.lemmatize(word)
print(lemword)
输出:
补鞋匠蚂蚁女人男孩需要找到二元爱好公共汽车狼
标签:wordnet,python,nltk,lemmatization 来源: https://codeday.me/bug/20190722/1504611.html