其他分享
首页 > 其他分享> > 08 分布式计算MapReduce--词频统计

08 分布式计算MapReduce--词频统计

作者:互联网


def getText():
txt=open("D:\\test.txt","r").read()
txt=txt.lower()
punctuation = r"""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~“”?,!【】()、。:;’‘……¥·"""
for ch in punctuation:
txt=txt.replace(ch,"")
return txt

hamletTxt=getText()
words=hamletTxt.split()
counts={}
for word in words:
counts[word]=counts.get(word,0)+1
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(100):
word,count=items[i]
print("{0:<10}{1:>5}".format(word,count))

标签:ch,word,--,items,分布式计算,getText,词频,counts,txt
来源: https://www.cnblogs.com/fkqs/p/15599084.html