python – 用“SAD”或“HAPPY”替换表情符号的代码不能正常工作
作者:互联网
所以我想用“HAPPY”代替所有快乐的表情符号,反之亦然“SAD”用于文本文件的悲伤表情符号.但代码不能正常工作.虽然它检测到表情符号(截至目前:-)),但在下面的例子中,它没有用文本替换表情符号,它只是附加文本,并且由于我似乎无法理解的原因,它也会附加两次.
dict_sad={":-(":"SAD", ":(":"SAD", ":-|":"SAD", ";-(":"SAD", ";-<":"SAD", "|-{":"SAD"}
dict_happy={":-)":"HAPPY",":)":"HAPPY", ":o)":"HAPPY",":-}":"HAPPY",";-}":"HAPPY",":->":"HAPPY",";-)":"HAPPY"}
#THE INPUT TEXT#
a="guys beautifully done :-)"
for i in a.split():
for j in dict_happy.keys():
if set(j).issubset(set(i)):
print "HAPPY"
continue
for k in dict_sad.keys():
if set(k).issubset(set(i)):
print "SAD"
continue
if str(i)==i.decode('utf-8','replace'):
print i
输入文本
a="guys beautifully done :-)"
输出(“HAPPY”即将出现两次,表情符号也不会消失)
guys
-
beautifully
done
HAPPY
HAPPY
:-)
预期输出
guys
beautifully
done
HAPPY
解决方法:
你将每个单词和每个表情符号都转换为一组;这意味着您正在寻找单个字符的重叠.您可能希望最多使用完全匹配:
for i in a.split():
for j in dict_happy:
if j == i:
print "HAPPY"
continue
for k in dict_sad:
if k == i:
print "SAD"
continue
您可以直接迭代字典,无需在那里调用.keys().您实际上似乎没有使用字典值;你可以这样做:
for word in a.split():
if word in dict_happy:
print "HAPPY"
if word in dict_sad:
print "SAD"
然后可能使用集而不是字典.这可以简化为:
words = set(a.split())
if dict_happy.viewkeys() & words:
print "HAPPY"
if dict_sad.viewkeys() & words:
print "SAD"
使用键上的dictionary view作为一组.尽管如此,使用套装仍然会更好:
sad_emoticons = {":-(", ":(", ":-|", ";-(", ";-<", "|-{"}
happy_emoticons = {":-)", ":)", ":o)", ":-}", ";-}", ":->", ";-)"}
words = set(a.split())
if sad_emoticons & words:
print "HAPPY"
if happy_emoticons & words:
print "SAD"
如果您想从文本中删除表情符号,则必须过滤单词:
for word in a.split():
if word in dict_happy:
print "HAPPY"
elif word in dict_sad:
print "SAD"
else:
print word
或者更好的是,结合两个词典并使用dict.get():
emoticons = {
":-(": "SAD", ":(": "SAD", ":-|": "SAD",
";-(": "SAD", ";-<": "SAD", "|-{": "SAD",
":-)": "HAPPY",":)": "HAPPY", ":o)": "HAPPY",
":-}": "HAPPY", ";-}": "HAPPY", ":->": "HAPPY",
";-)": "HAPPY"
}
for word in a.split():
print emoticons.get(word, word)
在这里,我将当前单词作为查找键和默认值传递;如果当前单词不是表情符号,则打印单词本身,否则打印单词SAD或HAPPY.
标签:python,nltk,text-processing 来源: https://codeday.me/bug/20190824/1712821.html