NLTK精度:“ ValueError:太多值无法解压缩”
作者:互联网
我正在尝试使用NLTK工具包对Twitter上的一部新电影进行情感分析.我遵循了NLTK“ movie_reviews”示例,并建立了自己的CategorizedPlaintextCorpusReader对象.当我调用nltk.classify.util.accuracy(classifier,testfeats)时,就会出现问题.这是代码:
import os
import glob
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
def word_feats(words):
return dict([(word, True) for word in words])
negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')
negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]
trainfeats = negfeats + posfeats
# Building a custom Corpus Reader
tweets = nltk.corpus.reader.CategorizedPlaintextCorpusReader('./tweets', r'.*\.txt', cat_pattern=r'(.*)\.txt')
tweetsids = tweets.fileids()
testfeats = [(word_feats(tweets.words(fileids=[f]))) for f in tweetsids]
print 'Training the classifier'
classifier = NaiveBayesClassifier.train(trainfeats)
for tweet in tweetsids:
print tweet + ' : ' + classifier.classify(word_feats(tweets.words(tweetsids)))
classifier.show_most_informative_features()
print 'accuracy:', nltk.classify.util.accuracy(classifier, testfeats)
直到到达最后一行,一切似乎都可以正常工作.那就是我得到错误的时候:
>>> nltk.classify.util.accuracy(classifier, testfeats)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/nltk/classify/util.py", line 87, in accuracy
results = classifier.classify_many([fs for (fs,l) in gold])
ValueError: too many values to unpack
有人在代码中看到任何错误吗?
谢谢.
解决方法:
错误讯息
File "/usr/lib/python2.7/dist-packages/nltk/classify/util.py", line 87, in accuracy
results = classifier.classify_many([fs for (fs,l) in gold])
ValueError: too many values to unpack
发生是因为无法将黄金物品拆成2元组((fs,l)):
[fs for (fs,l) in gold] # <-- The ValueError is raised here
如果黄金等于[(1,2,3)],则将得到相同的错误,因为3元组(1,2,3)无法解包为2元组(fs,l):
In [74]: [fs for (fs,l) in [(1,2)]]
Out[74]: [1]
In [73]: [fs for (fs,l) in [(1,2,3)]]
ValueError: too many values to unpack
金可能会埋在nltk.classify.util.accuracy的实现中,但这暗示您的输入,分类器或测试失败是错误的“形状”.
分类器没有问题,因为调用了准确性(分类器,trainfeats)
作品:
In [61]: print 'accuracy:', nltk.classify.util.accuracy(classifier, trainfeats)
accuracy: 0.9675
问题必须出在测试方面.
将trainfeat与testfeat进行比较.
trainfeats [0]是一个包含字典和分类的2元组:
In [63]: trainfeats[0]
Out[63]:
({u'!': True,
u'"': True,
u'&': True,
...
u'years': True,
u'you': True,
u'your': True},
'neg') # <--- Notice the classification, 'neg'
但是testfeats [0]只是一个字典,word_feats(tweets.words(fileids = [f])):
testfeats = [(word_feats(tweets.words(fileids=[f]))) for f in tweetsids]
因此,要解决此问题,您需要定义testfeat使其更像是trainfeats - word_feats返回的每个字典必须与一个分类配对.
标签:text-classification,python,nltk 来源: https://codeday.me/bug/20191009/1877575.html