pyparsing中使用QuotedString
作者:互联网
我在理解如何构建pyparsing解析器时遇到概念上的困难.步骤是:1)通过组合ParserElement的子类来构建解析器,以及2)使用解析器来解析字符串.
以下示例可以正常工作:
from pyparsing import Word, Literal, alphas, alphanums, delimitedList, QuotedString
name = Word(alphas+"_", alphanums+"_")
field = name
fieldlist = delimitedList(field)
doc = Literal('<Begin>') + fieldlist + Literal('**End**')
dstring = '<Begin>abc,de34,f_o_o**End**'
print(doc.parseString(dstring))
产生预期的令牌序列:
['<Begin>', 'abc', 'de34', 'f_o_o', '**End**']
但是(例如),类QuotedString并不将ParserElement作为参数,因此不能用于构建解析器.我希望在上面的示例中使用它,例如:
name = Word(alphas+"_", alphanums+"_")
field = QuotedString(name) ### Wrong: doesn't allow "name" as an argument
fieldlist = delimitedList(field)
解析以下形式的文档:
dstring = '<Begin>"abc", "de34", "f_o_o"**End**'
但是,由于不能以这种方式使用它,因此在构造引号字符串列表的解析器时包括QuotedString的正确语法是什么?
========编辑============
请参阅下面的答案…
解决方法:
QuotedString不能用于此任务.但是OR函数可以达到相同的效果-允许使用不同形式的引号,同时保留解析引号中包含的字符串的有效性的功能.下面的代码可以做到这一点:
from pyparsing import Word, Literal, alphas, alphanums, delimitedList
from pyparsing import Group, QuotedString, ParseException, Suppress
name = Word(alphas+"_", alphanums+"_")
field = Suppress('"') + name + Suppress('"') ^ \ # double quote
Suppress("'") + name + Suppress("'") ^ \ # single quote
Suppress("<") + name + Suppress(">") ^ \ # html tag
Suppress("{{")+ name + Suppress("}}") # django template variable
fieldlist = Group(delimitedList(field))
doc = Literal('<Begin>') + fieldlist + Literal('**End**')
dstring = [
'<Begin>"abc","de34","f_o_o"**End**', # Good
'<Begin><abc>,{{de34}},\'f_o_o\'**End**', # Good
'<Begin>"abc",\'de34","f_o_o\'**End**', # Bad - mismatched quotes
'<Begin>"abc","de34","f_o#o"**End**', # Bad - invalid identifier
]
for ds in dstring:
print(ds)
try:
print(' ', doc.parseString(ds))
except ParseException as err:
print(" "*(err.column-1) + "^")
print(err)
产生所需的输出,接受两个好的测试字符串,并拒绝两个不好的字符串:
<Begin>"abc","de34","f_o_o"**End**
['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
<Begin><abc>,{{de34}},'f_o_o'**End**
['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
<Begin>"abc",'de34","f_o_o'**End**
^
Expected "**End**" (at char 12), (line:1, col:13)
<Begin>"abc","de34","f_o#o"**End**
^
Expected "**End**" (at char 19), (line:1, col:20)
感谢Paul的所有帮助,并感谢您提供了如此出色的包装.
标签:pyparsing,python,parsing 来源: https://codeday.me/bug/20191119/2039231.html