爬虫Python-Re库的基本使用（python中re库作用）

ccvgpt 2024-07-20 11:56:35 基础教程 9 ℃

Re库的主要功能函数

re.search(pattern,string,flags=0)

pattern:正则表达式的字符串或原生字符串表示
string:待匹配字符串
flags:正则表达式使用时的控制标记

示例：

match = re.search(r'[1-9]\d{5}','BIT 100081')
if match:
 print(match.group(0))
打印结果：100081

flags标记

re.s：默认的Re库中，.操作符可以匹配换行符以外的所有字符，使用re.s之后，.操作符可以匹配所有字符。

re.match(pattern, string, flags=0)

match = re.match(r'[1-9]\d{5}','BIT 100081')
if match:
 print(match.group(0))
没有打印结果，因为这里match对象为空，注意if语句，如果不对是否有match对象进行判断，在后面的代码中直接调用match对象的grou方法系统会报错

re.findall(pattern, string, flags=0)

ls = re.findall(r'[1-9]\d{5}','BIT 100081 TSU100084')
print(ls)
打印结果：['100081', '100084']
输出为列表形式

re.split(pattern, string, maxsplit=0, flags=0)

maxsplit:最大分割书，剩余部分作为最后一个元素输出

ls = re.split(r'[1-9]\d{5}','BIT 100081 TSU100084')
print(ls)
输出结果：['BIT ', ' TSU', '']

maxsplit = 1

ls = re.split(r'[1-9]\d{5}','BIT 100081 TSU100084',maxsplit=1)
print(ls)
输出结果：['BIT ', ' TSU100084']

re.finditer(pattern, string, flags=0)

for m in re.finditer(r'[1-9]\d{5}','BIT 100081 TSU100084'):
 if m:
 print(m.group(0))
输出结果：100081
 100084

re.sub(pattern, repl, string, count=0, flags=0)

repl: 替换匹配字符串的字符串

count:替换的最大次数

str = re.sub(r'[1-9]\d{5}',':zipcode','BIT 100081 TSU 100084')
print(str)
输出结果：BIT :zipcode TSU :zipcode

Re库的等价用法

一次性操作

rst = re.search(r'[1-9]\d{5}','BIT 100081')

面向对象用法：编译后的多次操作

pat = re.compile(r'[1-9]\d{5}')
rst = pat.search('BIT 100081')

regex = re.compile(pattern,flags=0) 将正则表达式的字符串形式编译成正则表达式对象

注意：[1-9]\d{5}只是符合正则表达式语句的字符串，经过编译后的re.compile(r'[1-9]\d{5}')才是正则表达式对象

match对象

查看match对象的类型

match = re.search(r'[1-9]\d{5}','BIT 100081')
print(type(match))
输出结果：<class '_sre.SRE_Match'>

match对象的属性

match对象的常用方法

示例：

match = re.search(r'[1-9]\d{5}','BIT 100081')
print(match.re)
输出结果：re.compile('[1-9]\\d{5}')
print(match.pos)
输出结果：0
print(match.endpos)
输出结果：10
print(match.string)
输出结果：BIT 100081
print(match.group(0))
输出结果：100081
**注意：match对象只包含一次匹配的结果，如果希望获得每一次的match对象，使用re.finditer()方法
print(match.start())
输出结果：4
print(match.end())
输出结果：10
print(match.span())
输出结果：(4, 10)

贪婪匹配和最小匹配

Re库默认采用贪婪匹配，即输出匹配最长的子串

match = re.search(r'PY.*N','PYANBNCNDNENFNGN')
print(match.group(0))
输出结果：PYANBNCNDNENFNGN

最小匹配

更改一下正则表达式即可获得最小匹配match = re.search(r'PY.*?N','PYANBNCNDNENFNGN') print(match.group(0)) 输出结果：PYANRe库中如果你希望获得最小匹配，可对以下操作符进行扩展

网站首页 > 基础教程正文

爬虫Python-Re库的基本使用（python中re库作用）

Re库的主要功能函数

Re库的等价用法

match对象

贪婪匹配和最小匹配

猜你喜欢

网站首页 > 基础教程 正文

爬虫Python-Re库的基本使用（python中re库作用）

Re库的主要功能函数

Re库的等价用法

match对象

贪婪匹配和最小匹配

猜你喜欢

网站首页 > 基础教程正文