首页 > 其他分享> > 文本分析笔记

文本分析笔记

2019-08-14 19:03:39 作者：互联网

Python 文本分析笔记

中文停用词处理

自行下载 shotwords.txt，代码如下：

def stopwordslist(filepath): stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()] return stopwords # 对句子进行分词 def seg_sentence(sentence): sentence_seged = jieba.cut(sentence.strip()) stopwords = stopwordslist('/root/stopwords.txt') # 这里加载停用词的路径 outstr = '' for word in sentence_seged: if word not in stopwords: if word != '\t': outstr += word outstr += " " return outstr

标签：分析,txt,word,sentence,strip,笔记,stopwords,outstr,文本
来源： https://www.cnblogs.com/dalton/p/11354027.html

文本分析 笔记

Python 文本分析 笔记

中文停用词处理

文本分析笔记

Python 文本分析笔记