编程语言
首页 > 编程语言> > python-正则表达式经常重复

python-正则表达式经常重复

作者:互联网

我正在尝试编写一个正则表达式以匹配可选的带引号的值(有效引号为“’和`).
规则是两个引号的出现是转义的引号.

这是我想出的正则表达式:

(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote).)|((?=(?P=quote)).){2})*|[^\s;]*))(?(quote)(?P=quote)|)

现在可读(带有注释,表明我的想法):

(?P<quote>["'`])?                   #named group Quote (any quoting character?)

    (?P<value>                      #name this group "value", what I am interested in
        (?(quote)               #if quoted 
            ((?!(?P=quote).)|((?=(?P=quote)).){2})* #see below
                                    #match either anything that is not the quote
                                    #or match 2 quotes
        |
            [^\s;]*         #match anything that is not whitespace or ; (my seperators if there are no quotes)
        )
    )

(?(quote)(?P=quote)|)               #if we had a leeding quote we need to consume a closing quote

它对未加引号的字符串执行良好,带引号的字符串使它崩溃:

    match = re.match(regexValue, line)
  File "****/jython2.5.1/Lib/re.py", line 137, in match
    return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion depth exceeded

我做错了什么?

编辑:示例输入=>输出(用于捕获组“值”(所需)

text    => text
'text'  => text
te xt   => te
'te''xt'=> te''xt   #quote=' => strreplace("''","'") => desired result: te'xt
'te xt' => te xt

edit2:看着它时,我发现有一个错误,请参阅下文,但是我认为以上内容仍然有效.它可能是Jython的错误,但是仍然无法实现我想要的功能:(非常细微的差别,将点移出了超前组

new:(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote)).|((?=(?P=quote)).){2})*|[^\s;]*))(?(quote)(?P=quote)|)
old:(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote).)|((?=(?P=quote)).){2})*|[^\s;]*))(?(quote)(?P=quote)|)

解决方法:

正如评论中所建议的,我建议保持明确,并写下所有可能性:

r = r"""
    ([^"'`]+)
    |
    " ((?:""|[^"])*) "
    |
    ' ((?:''|[^'])*) '
    |
    ` ((?:``|[^`])*) `
"""

提取匹配项时,您可以使用以下事实:仅填充四组中的一组,而只需删除所有空组:

r = re.compile(r, re.X)
for m in r.findall(''' "fo""o" and 'bar''baz' and `quu````x` '''):
    print ''.join(m)

标签:jython,python,regex
来源: https://codeday.me/bug/20191201/2079477.html