在Python 3.7中检索长度为零的re.sub()的python 3.6处理
作者:互联网
零长度匹配的处理已在python 3.7中更改.考虑使用python 3.6(及更早版本)的以下内容:
>>> import re
>>> print(re.sub('a*', 'x', 'bac'))
xbxcx
>>> print(re.sub('.*', 'x', 'bac'))
x
我们在python 3.7中获得以下内容:
>>> import re
>>> print(re.sub('a*', 'x', 'bac'))
xbxxcx
>>> print(re.sub('.*', 'x', 'bac'))
xx
我了解这是PCRE的标准行为.此外,re.finditer()似乎总是检测到其他匹配项:
>>> for m in re.finditer('a*', 'bac'):
... print(m.start(0), m.end(0), m.group(0))
...
0 0
1 2 a
2 2
3 3
就是说,我对检索python 3.6的行为感兴趣(这是针对实现sed in python的业余项目的).
我可以提供以下解决方案:
def sub36(regex, replacement, string):
compiled = re.compile(regex)
class Match(object):
def __init__(self):
self.prevmatch = None
def __call__(self, match):
try:
if match.group(0) == '' and self.prevmatch and match.start(0) == self.prevmatch.end(0):
return ''
else:
return re._expand(compiled, match, replacement)
finally:
self.prevmatch = match
return compiled.sub(Match(), string)
这使:
>>> print(re.sub('a*', 'x', 'bac'))
xbxxcx
>>> print(sub36('a*', 'x', 'bac'))
xbxcx
>>> print(re.sub('.*', 'x', 'bac'))
xx
>>> print(sub36('.*', 'x', 'bac'))
x
但是,对于这些示例,这似乎是非常精心制作的.
什么是实现re.sub()零长度匹配python 3.7的python 3.6行为的正确方法?
解决方法:
您的解决方案可能在regex egg中:
正则蛋介绍
This regex implementation is backwards-compatible with the standard
‘re’ module, but offers additional functionality.
The re module’s behaviour with zero-width matches changed in Python
3.7, and this module will follow that behaviour when compiled for Python 3.7.
安装:
pip install regex
用法:
使用regex
,您可以指定将用来编译正则表达式模式的版本(V0,V1),即:
# Python 3.7 and later
import regex
>>> regex.sub('.*', 'x', 'test')
'xx'
>>> regex.sub('.*?', '|', 'test')
'|||||||||'
# Python 3.6 and earlier
import regex
>>> regex.sub('(?V0).*', 'x', 'test')
'x'
>>> regex.sub('(?V1).*', 'x', 'test')
'xx'
>>> regex.sub('(?V0).*?', '|', 'test')
'|t|e|s|t|'
>>> regex.sub('(?V1).*?', '|', 'test')
'|||||||||'
注意:
Version can be indicated by
VERSION0
orV0
flag, or(?V0)
in the
pattern.
资料来源:
Regex thread – issue2636
regex 2018.11.22
标签:python-3-6,python,python-3-x,regex,python-3-7 来源: https://codeday.me/bug/20191010/1888374.html