编程语言
首页 > 编程语言> > python – 单词列表的词形还原

python – 单词列表的词形还原

作者:互联网

所以我在文本文件中有一个单词列表.我想对它们进行词形还原以删除具有相同含义但处于不同时态的词.喜欢尝试,试过等.当我这样做时,我不断收到类似TypeError的错误:不可用的类型:’list’

    results=[]
    with open('/Users/xyz/Documents/something5.txt', 'r') as f:
       for line in f:
          results.append(line.strip().split())

    lemma= WordNetLemmatizer()

    lem=[]

    for r in results:
       lem.append(lemma.lemmatize(r))

    with open("lem.txt","w") as t:
      for item in lem:
        print>>t, item

如何将已经令牌的词语变形?

解决方法:

WordNetLemmatizer.lemmatize方法可能需要一个字符串,但是你传递一个字符串列表.这会给你TypeError异常.

line.split()的结果是一个字符串列表,您将其作为列表添加到结果中,即列表列表.

你想使用results.extend(line.strip().split())

results = []
with open('/Users/xyz/Documents/something5.txt', 'r') as f:
    for line in f:
        results.extend(line.strip().split())

lemma = WordNetLemmatizer()

lem = map(lemma.lemmatize, results)

with open("lem.txt", "w") as t:
    for item in lem:
        print >> t, item

或没有中间结果列表重构

def words(fname):
    with open(fname, 'r') as document:
        for line in document:
            for word in line.strip().split():
                yield word

lemma = WordNetLemmatizer()
lem = map(lemma.lemmatize, words('/Users/xyz/Documents/something5.txt'))

标签:python,nltk,lemmatization
来源: https://codeday.me/bug/20190528/1168380.html