一、零宽断言

此篇文章为python正则表达式的高阶入门,正则基础入门请参考程序员入门基础:python的正则表达式。

断言也可以理解为判断的意思,正则表达式中有很多这样的断言,常见的如 ^ 和 /A,匹配字符串或者行的末尾 $ 和 /Z,单词边界/B等等。

零宽断言是并不去真正的匹配字符串文本,而仅仅是匹配对应的位置,断言本身并不匹配,因为匹配的宽度为零,故零宽。当然零宽断言还有各种叫法,如环视、预搜索等。

常见的零宽断言有四种形式,即正向肯定,正向否定,负向肯定,负向否定。

二、正向肯定

格式:(?=exp),正向(向右)匹配位置,但不计入结果

import re

#(?=dog)位置开始匹配,但不计入匹配结果

print("匹配开始位置dog:",re.search(r'(?=dog)dogeat',"dogeat@homework"))

#dog(?=eat)位置开始匹配,但(?=eat)不计入匹配结果

print("匹配开始位置eat:",re.search(r'dog(?=eat)eat',"dogeat@homework"))

#注意dog(?=homework)实际等于doghomework与原字符串不同,不可匹配

print("匹配开始位置homework:",re.search(r'dog(?=homework)@homework',"dogeat@homework"))

开始位置dog: <re.Match object; span=(0, 6), match='dogeat'>

开始位置eat: <re.Match object; span=(0, 6), match='dogeat'>

开始位置homework: None

三、正向否定

格式:(?!exp),正向(向右)否定匹配位置,但不计入结果

import re#

#@homework位置开始匹配,无aa,匹配成功

print("(?!aa)@homework:",re.search(r'(?!eat)@homework',"dogeat@homework"))

#dog位置开始匹配,后续无cat,匹配成功

print("dog(?!cat)eat:",re.search(r'dog(?!cat)eat',"dogeat@homework"))

#dog位置开始匹配,后续无eat,匹配失败

print("dog(?!eat)@homework:",re.search(r'dog(?!eat)eat@homework',"dogeat@homework"))

(?!aa)@homework: <re.Match object; span=(6, 15), match='@homework'>

dog(?!cat)eat: <re.Match object; span=(0, 6), match='dogeat'>

dog(?!eat)@homework: None

四、负向肯定

格式:(?<=exp),负向(向左)匹配位置,但不计入结果。

特别注意,如遇负向零宽断言(包括负向肯定与否定),正则验证的顺序为,先跳过断言后,再从右向左验证,即负向验证。

import re##

#@homework位置开始向左匹配,匹配成功

print("(?<=eat)@homework:",re.search(r'(?<=eat)@homework',"dogeat@homework"))

#eat位置开始向左匹配,匹配成功

print("dog(?<=dog)eat:",re.search(r'dog(?<=dog)eat',"dogeat@homework"))

#eat@homework位置开始匹配,左侧无eat,匹配失败,应该为dog

print("dog(?<=eat)eat@homework:",re.search(r'dog(?<=eat)eat@homework',"dogeat@homework"))

(?<=eat)@homework: <re.Match object; span=(6, 15), match='@homework'>

dog(?<=dog)eat: <re.Match object; span=(0, 6), match='dogeat'>

dog(?<=eat)eat@homework: None

五、负向否定

格式:(?<!exp),负向(向左)否定匹配位置,但不计入结果。

import re###

#@homework位置开始向左匹配,匹配失败

print("(?<!eat)@homework:",re.search(r'(?<!eat)@homework',"dogeat@homework"))

#eat位置开始向左匹配,匹配失败

print("dog(?<!dog)eat:",re.search(r'dog(?<!dog)eat',"dogeat@homework"))

#eat@homework位置开始匹配,左侧无eat,匹配成功,失败时为dog

print("dog(?<!eat)eat@homework:",re.search(r'dog(?<!eat)eat@homework',"dogeat@homework"))

(?<!eat)@homework: None

dog(?<!dog)eat: None

dog(?<!eat)eat@homework: <re.Match object; span=(0, 15), match='dogeat@homework'>