首页 > 其他分享> > 正则表达式 python技巧 python的字符串匹配

正则表达式 python技巧 python的字符串匹配

2023-01-06 21:23:48 作者：互联网

正则表达式（regex）是用于查找或匹配字符串中的模式的特殊字符序列，如正则表达式简介所解释的那样。我们之前已经展示了如何在 JavaScript 和 PHP 中使用正则表达式。本文的重点是 Python 正则表达式，目的是帮助您更好地了解如何在 Python 中操作正则表达式。

您将学习如何在程序中有效地使用 Python 正则表达式函数和方法，因为我们涵盖了处理 Python 正则表达式对象所涉及的细微差别

Python 中的正则表达式模块：re 和正则表达式

Python有两个模块 - 和 - 便于使用正则表达式。该模块内置于Python中，而该模块由Matthew Barnett开发，可在PyPI上使用。Barnett 的模块是使用内置模块开发的，两个模块具有相似的功能。它们在执行方面有所不同。内置模块是两者中更流行的，因此我们将在此处使用该模块。reregexreregexregexrere

Python的内置re模块

通常情况下，Python 开发人员在执行正则表达式时会使用该模块。正则表达式语法的一般结构保持不变（字符和符号），但该模块提供了一些函数和方法来有效地执行 Python 程序中的正则表达式。re

在使用该模块之前，我们必须像任何其他 Python 模块或库一样将其导入到我们的文件中：import re

这使得模块在当前文件中可用，以便 Python 的正则表达式函数和方法易于访问。使用该模块，我们可以创建 Python 正则表达式对象，操作匹配的对象，并在必要时应用标志。

一系列重新函数。

该模块具有诸如、和等函数，我们将首先讨论这些函数。rere.search()re.match()re.compile()

re.search（pattern， string， flags=0） vs re.match（pattern， string， flags=0）

和在字符串中搜索 Python 正则表达式模式，如果找到或未找到匹配对象，则返回匹配项。re.search()re.match()None

这两个函数始终返回在给定字符串中找到的第一个匹配子字符串，并维护 flag 的默认值。但是，当函数扫描整个字符串以查找匹配项时，仅在字符串开头搜索匹配项。0search()match()

Python的文档：re.search()

扫描字符串，查找正则表达式模式产生匹配的第一个位置，并返回相应的匹配对象。如果字符串中没有位置与模式匹配，则返回;请注意，这与在字符串中的某个点查找零长度匹配项不同。None

Python的文档：re.match()

如果字符串开头的零个或多个字符与正则表达式模式匹配，则返回相应的 match 对象。如果字符串与模式不匹配，则返回;请注意，这与零长度匹配不同。None

让我们看一些代码示例来进一步阐明：

search_result = [re.search](http://re.search)(r'\d{2}', 'I live at 22 Garden Road, East Legon')

print(search_result)

print(search_result.group())

>>>>

<re.Match object; span=(10, 12), match='22'>

22

match_result = re.match(r'\d{2}', 'I live at 22 Garden Road, East Legon')

print(match_result)

print(match_result.group())

>>>>

None

Traceback (most recent call last):

File "/home/ini/Dev./sitepoint/regex.py", line 4, in <module>

print(match_result.group())

AttributeError: 'NoneType' object has no attribute 'group'

从上面的示例中返回，因为字符串的开头没有匹配项。调用该方法时引发了 An，因为没有匹配对象：NoneAttributeErrorgroup()

match_result = re.match(r'\d{2}', "45 cars were used for the president's convoy")

print(match_result)

print(match_result.group())

>>>>

<re.Match object; span=(0, 2), match='45'>

45

对于字符串开头的匹配对象 45，该方法工作正常。match()

re.compile（pattern， flags=0）

该函数采用给定的正则表达式模式，并将其编译为正则表达式对象，用于在字符串或文本中查找匹配项。它还接受 a 作为可选的第二个参数。此方法很有用，因为正则表达式对象可以分配给变量，稍后在我们的 Python 代码中使用。永远记住在创建 Python 正则表达式对象时使用原始字符串。compile()flagr"..."

下面是它的工作原理示例：

regex_object = re.compile(r'b[ae]t')

mo = regex_object.search('I bet, you would not let a bat be your president')

print(regex_object)

>>>>

re.compile('b[ae]t')

re.fullmatch（pattern， string， flags=0）

此函数采用两个参数：作为正则表达式模式传递的字符串、要搜索的字符串和可选的标志参数。如果整个字符串与给定的正则表达式模式匹配，则返回匹配对象。如果没有匹配项，则返回：None

regex_object = re.compile(r'Tech is the future')

mo = regex_object.fullmatch('Tech is the future, join now')

print(mo)

print([mo.group](http://mo.group)())

>>>>

None

Traceback (most recent call last):

File "/home/ini/Dev./sitepoint/regex.py", line 16, in <module>

print([mo.group](http://mo.group)())

AttributeError: 'NoneType' object has no attribute 'group'

代码引发，因为没有字符串匹配。AttributeError

re.findall（pattern， string， flags=0）

该函数返回在给定字符串中找到的所有匹配对象的列表。它从左到右遍历字符串，直到返回所有匹配项。请参阅下面的代码片段：findall()

regex_object = re.compile(r'[A-Z]\w+')

mo = regex_object.findall('Pick out all the Words that Begin with a Capital letter')

print(mo)

>>>>

['Pick', 'Words', 'Begin', 'Capital']

在上面的代码片段中，正则表达式由字符类和单词字符组成，这可确保匹配的子字符串以大写字母开头。

re.sub（pattern， repl， string， count=0， flags=0）

在函数的帮助下，字符串的某些部分可以用另一个子字符串替换。它至少需要三个参数：搜索模式、替换字符串和要处理的字符串。如果未找到匹配项，则原始字符串将保持不变。在不传递 count 参数的情况下，默认情况下，该函数会查找正则表达式的一个或多个匹配项，并替换所有匹配项。sub()

下面是一个示例：

regex_object = re.compile(r'disagreed')

mo = regex_object.sub('agreed',"The founder and the CEO disagreed on the company's new direction, the investors disagreed too.")

print(mo)

>>>>

The founder and the CEO agreed on the company's new direction, the investors agreed too.

subn（pattern， repl， string， count=0， flags=0）

该函数执行与相同的操作，但它返回一个元组，其中包含字符串和替换数。请参阅下面的代码片段：subn()sub()

regex_object = re.compile(r'disagreed')

mo = regex_object.subn('agreed',"The founder and the CEO disagreed on the company's new direction, the investors disagreed too.")

print(mo)

>>>>

("The founder and the CEO agreed on the company's new direction, the investors agreed too.", 2)

匹配对象和方法

当正则表达式模式与正则表达式对象的 or 方法中的给定字符串匹配时，将返回匹配对象。匹配对象有几种方法，在 Python 中操作正则表达式时被证明是有用的。search()match()

Match.group（[group1， ...]）

此方法返回匹配对象的一个或多个子组。单个参数将返回信号子组;多个参数将根据其索引返回多个子组。默认情况下，该方法返回整个匹配子字符串。当中的参数大于或小于子组时，将引发异常。group()group()IndexError
下面是一个示例：

regex_object = re.compile(r'(\+\d{3}) (\d{2} \d{3} \d{4})')

mo = regex_object.search('Pick out the country code from the phone number: +233 54 502 9074')

print([mo.group](http://mo.group)(1))

传递给该方法的参数（如上例所示）挑选出加纳的国家/地区代码。在不带参数或作为参数的情况下调用该方法将返回 match 对象的所有子组：1group(1)+2330

regex_object = re.compile(r'(\+\d{3}) (\d{2} \d{3} \d{4})')

mo = regex_object.search('Pick out the phone number: +233 54 502 9074')

print([mo.group](http://mo.group)())

匹配组（默认值 = 无）

groups()返回与给定字符串匹配的子组元组。正则表达式模式组总是用括号捕获 — — 当有匹配时，这些组作为元组中的元素返回：()

regex_object = re.compile(r'(\+\d{3}) (\d{2}) (\d{3}) (\d{4})')

mo = regex_object.search('Pick out the phone number: +233 54 502 9074')

print(mo.groups())

('+233', '54', '502', '9074')

Match.start（[group]） & Match.end（[group]）

该方法返回起始索引，而该方法返回匹配对象的结束索引：start()end()

regex_object = re.compile(r'\s\w+')

mo = regex_object.search('Match any word after a space')

print('Match begins at', mo.start(), 'and ends', mo.end())

print([mo.group](http://mo.group)())

Match begins at 5 and ends 9

any

上面的示例有一个正则表达式模式，用于匹配空格后的任何单词字符。找到匹配项 — — 从位置 5 开始，到 9 结束。' any'

Pattern.search（string[， pos[， endpos]]）

该值指示应开始搜索匹配对象的索引位置。指示对匹配项的搜索应停止的位置。两者的值，可以在字符串后面的 or 方法中作为参数传递。这是它的工作原理：posendposposendpossearch()match()

regex_object = re.compile(r'[a-z]+[0-9]')

mo = regex_object.search('find the alphanumeric character python3 in the string', 20 , 30)

print([mo.group](http://mo.group)())

python3

上面的代码挑选出搜索字符串中的任何字母数字字符。

搜索从字符串索引位置 20 开始，在 30 处停止。

re正则表达式标志

Python 允许在使用模块方法（如和）时使用标志，这为正则表达式提供了更多的上下文。标志是可选参数，用于指定 Python 正则表达式引擎如何查找匹配对象。research()match()

再。我（重新。忽略案例）

此标志在执行大小写无关匹配时使用。正则表达式引擎将忽略正则表达式模式的大写或小写变体：

regex_object = [re.search](http://re.search)('django', 'My tech stack comprises of python, Django, MySQL, AWS, React', re.I)

print(regex_object.group())

Django

确保找到匹配对象，无论它是大写还是小写。re.I

再。S（重新。多特尔）

特殊字符匹配除换行符以外的任何字符。引入此标志还将匹配文本或字符串块中的换行符。请参阅以下示例：'.'

regex_object= [re.search](http://re.search)('.+', 'What is your favourite coffee flavor \nI prefer the Mocha')

print(regex_object.group())

What is your favourite coffee flavor

该字符仅从字符串的开头找到匹配项，并在换行符处停止。引入标志将匹配换行符。请参阅以下示例：'.'re.DOTALL

regex_object= [re.search](http://re.search)('.+', 'What is your favourite coffee flavor \nI prefer the Mocha', re.S)

print(regex_object.group())

What is your favourite coffee flavor

I prefer the Mocha

再。M（re.多行）

默认情况下，特殊字符仅与字符串的开头匹配。引入此标志后，该函数会在每行的开头搜索匹配项。该字符仅匹配字符串末尾的模式。但该标志确保它还在每行末尾找到匹配项：'^''$'re.M

regex_object = [re.search](http://re.search)('^J\w+', 'Popular programming languages in 2022: \nPython \nJavaScript \nJava \nRust \nRuby', re.M)

print(regex_object.group())

JavaScript

再。X（re.详细）

有时，Python 正则表达式模式可能会变得冗长而混乱。当我们需要在正则表达式模式中添加注释时，该标志会有所帮助。我们可以使用字符串格式创建一个带有注释的多行正则表达式：re.X'''

email_regex = [re.search](http://re.search)(r'''

[a-zA-Z0-9._%+-]+ # username composed of alphanumeric characters

@ # @ symbol

[a-zA-Z0-9.-]+ # domain name has word characters

(\.[a-zA-Z]{2,4}) # dot-something

''', 'extract the email address in this string [kwekujohnson1@gmail.com](mailto:kwekujohnson1@gmail.com) and send an email', re.X)

print(email_regex.group())

[kwekujohnson1@gmail.com](mailto:kwekujohnson1@gmail.com)

Python 中正则表达式的实际示例

现在让我们深入了解一些更实际的例子。

Python 密码强度测试正则表达式

正则表达式最流行的用例之一是测试密码强度。注册任何新帐户时，都会进行检查以确保我们输入适当的字母，数字和字符组合，以确保密码安全。

下面是用于检查密码强度的示例正则表达式模式：

password_regex = re.match(r"""

^(?=.*?[A-Z]) # this ensures user inputs at least one uppercase letter

(?=.*?[a-z]) # this ensures user inputs at least one lowercase letter

(?=.*?[0-9]) # this ensures user inputs at least one digit

(?=.*?[#?!@$%^&*-]) # this ensures user inputs one special character

.{8,}$ #this ensures that password is at least 8 characters long

""", '@Sit3po1nt', re.X)

print('Your password is' ,password_regex.group())

>>>>

Your password is @Sit3po1nt

请注意使用“^”和“$”来确保输入字符串（密码）是正则表达式匹配。

在文件正则表达式中搜索和替换 Python

下面是此示例的目标：

创建一个文件“Pangram.txt”。
在文件中添加一个简单的文本，"The five boxing wizards climb quickly."
编写一个简单的 Python 正则表达式来搜索并将“climb”替换为“jump”，这样我们就有了 pangram。

下面是一些执行此操作的代码：

#importing the regex module

import re

file_path="pangram.txt"

text="climb"

subs="jump"

#defining the replace method

def search_and_replace(filePath, text, subs, flags=0):

with open(file_path, "r+") as file:

#read the file contents

file_contents = [file.read](http://file.read)()

text_pattern = re.compile(re.escape(text), flags)

file_contents = text_pattern.sub(subs, file_contents)

[file.seek](http://file.seek)(0)

file.truncate()

file.write(file_contents)

#calling the search_and_replace method

search_and_replace(file_path, text, subs)

Python 网页抓取正则表达式

有时您可能需要在互联网上收集一些数据或自动执行简单的任务，例如网络抓取。正则表达式在在线提取某些数据时非常有用。下面是一个示例：

import urllib.request
phone_number_regex = r'\(\d{3}\) \d{3}-\d{4}'
url = 'https://www.summet.com/dmsi/html/codesamples/addresses.html'
# get response
response = urllib.request.urlopen(url)
# convert response to string
string_object = [response.read](http://response.read)().decode("utf8")
# use regex to extract phone numbers
regex_object = re.compile(phone_regex)
mo = regex_object.findall(string_object)
# print top 5 phone numbers
print(mo[: 5])
['(257) 563-7401', '(372) 587-2335', '(786) 713-8616', '(793) 151-6230', '(49

结论

正则表达式可以从简单到复杂不等。它们是编程的重要组成部分，如上面的例子所示。为了更好地理解 Python 中的正则表达式，最好从熟悉字符类、特殊字符、锚点和分组结构等内容开始。

我们可以进一步加深对 Python 中正则表达式的理解。Python 模块使快速启动和运行变得更加容易。

标签：python,技巧,特殊字符,函数,程序,模式匹配,函数采用,icode9
来源：