Python使用可选的键/ val对标记句子
作者:互联网
我正在尝试解析您有句子的句子(或文本行),并且可选地在同一行上跟随一些键/值对.键/值对不仅是可选的,而且是动态的.我正在寻找一个类似的结果:
输入:
"There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
输出:
Values = {'theSentence' : "There was a cow at home.",
'home' : "mary",
'cowname' : "betsy",
'date'= "10-jan-2013"
}
输入:
"Mike ordered a large hamburger. lastname=Smith store=burgerville"
输出:
Values = {'theSentence' : "Mike ordered a large hamburger.",
'lastname' : "Smith",
'store' : "burgerville"
}
输入:
"Sam is nice."
输出:
Values = {'theSentence' : "Sam is nice."}
感谢您的任何输入/指示.我知道句子看起来这是一个家庭作业问题,但我只是一个Python新手.我知道它可能是一个正则表达式的解决方案,但我不是最好的正则表达式.
解决方法:
我用re.sub:
import re
s = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
d = {}
def add(m):
d[m.group(1)] = m.group(2)
s = re.sub(r'(\w+)=(\S+)', add, s)
d['theSentence'] = s.strip()
print d
如果您愿意,这里有更紧凑的版本:
d = {}
d['theSentence'] = re.sub(r'(\w+)=(\S+)',
lambda m: d.setdefault(m.group(1), m.group(2)) and '',
s).strip()
或者,也许,findall是一个更好的选择:
rx = '(\w+)=(\S+)|(\S.+?)(?=\w+=|$)'
d = {
a or 'theSentence': (b or c).strip()
for a, b, c in re.findall(rx, s)
}
print d
标签:python,regex,tokenize,text-parsing 来源: https://codeday.me/bug/20190529/1177462.html