首页 > 编程语言> > Python NLTK代码片段,用于使用特征频率训练分类器(朴素贝叶斯)

Python NLTK代码片段,用于使用特征频率训练分类器(朴素贝叶斯)

2019-08-27 14:47:11 作者：互联网

我想知道是否有人可以通过代码片段来帮助我,该代码片段演示了如何使用特征频率方法而不是特征存在来训练朴素贝叶斯分类器.

我认为下面第6章link text中所示的是指使用Feature Presence(FP)创建一个特征集 –

def document_features(document): 
    document_words = set(document) 

    features = {}
    for word in word_features:
        features['contains(%s)' % word] = (word in document_words)

    return features

请指教

解决方法:

在您发送的链接中,该功能是功能提取器,它只检查给定文档中是否存在这些单词.

以下是每行代码的完整代码：

1     all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
2     word_features = all_words.keys()[:2000] 

3     def document_features(document): 
4          document_words = set(document) 
5          features = {}
6          for word in word_features:
7               features['contains(%s)' % word] = (word in document_words)
8          return features

在第1行中,它创建了所有单词的列表.

在第2行中,最常用的是2000个单词.

3功能的定义

4转换文档列表(我认为它必须是一个列表)并将列表转换为一个集合.

5宣布一本字典

6迭代所有最常见的2000个单词

7创建一个字典,其中键是’contains(theword)’,值为true或false.如果文档中存在该单词,则为True,否则为false

8返回字典,显示文档是否包含最频繁的2000个单词.

这回答了你的问题了吗？

标签：stanford-nlp,python,nlp,nltk
来源： https://codeday.me/bug/20190827/1741760.html