主頁(yè) > 知識(shí)庫(kù) > 正則表達(dá)式（regular）知識(shí)(整理)

正則表達(dá)式（regular）知識(shí)(整理)

正則（regular），要使用正則表達(dá)式需要導(dǎo)入Python中的re（regular正則的縮寫(xiě)）模塊。正則表達(dá)式是對(duì)字符串的處理，我們知道，字符串中有時(shí)候包含很多我們想要提取的信息，掌握這些處理字符串的方法，能夠方便很多我們的操作。

正則表達(dá)式（regular），處理字符串的方法。

正則是一種常用的方法，因?yàn)閜ython中文件處理很常見(jiàn)，文件里面包含的是字符串，要想處理字符串，那么就需要用到正則表達(dá)式。因而要掌握好正則表達(dá)式。下面下來(lái)看看正則表達(dá)式中包含的方法：

（1）match(pattern, string, flags=0)

 def match(pattern, string, flags=0):
　　　　"""Try to apply the pattern at the start of the string, returning
　　　　a match object, or None if no match was found."""
　　　　return _compile(pattern, flags).match(string)

從上面注釋：Try to apply the pattern at the start of the string,returning a match object,or None if no match was found.從字符串的開(kāi)頭開(kāi)始查找，返回一個(gè)match object對(duì)象，如果沒(méi)有找到，返回一個(gè)None。

重點(diǎn)：（1）從開(kāi)頭開(kāi)始查找；（2）如果查找不到返回None。

下面來(lái)看看幾個(gè)實(shí)例：

 import re
　　string = "abcdef"
　　m = re.match("abc",string)  （1）匹配"abc"，并查看返回的結(jié)果是什么
　　print(m)
　　print(m.group()) 
　　n = re.match("abcf",string)
　　print(n)      (2）字符串不在列表中查找的情況
　　l = re.match("bcd",string)  （3）字符串在列表中間查找情況
　　print(l)

運(yùn)行結(jié)果如下：

 _sre.SRE_Match object; span=(0, 3), match='abc'>  （1）abc             （2）　None             （3）
None             （4）

從上面輸出結(jié)果（1）可以看出，使用match()匹配，返回的是一個(gè)match object對(duì)象，要想轉(zhuǎn)換為看得到的情況，要使用group()進(jìn)行轉(zhuǎn)換（2）處所示；如果匹配的正則表達(dá)式不在字符串中，則返回None（3）；match(pattern,string,flag)是從字符串開(kāi)始的地方匹配的，并且只能從字符串的開(kāi)始處進(jìn)行匹配（4）所示。

（2）fullmatch(pattern, string, flags=0)

def fullmatch(pattern, string, flags=0):
　　　　"""Try to apply the pattern to all of the string, returning
　　　　a match object, or None if no match was found."""
　　　　return _compile(pattern, flags).fullmatch(string)

從上面注釋：Try to apply the pattern to all of the string,returning a match object,or None if no match was found...

（3）search(pattern,string,flags)

 def search(pattern, string, flags=0):
　　　　"""Scan through string looking for a match to the pattern, returning
　　　　a match object, or None if no match was found."""
　　　　return _compile(pattern, flags).search(string)
 search(pattern,string,flags)的注釋是Scan throgh string looking for a match to the pattern,returning a match object,or None if no match was found.在字符串任意一個(gè)位置查找正則表達(dá)式，如果找到了則返回match object對(duì)象，如果查找不到則返回None。

重點(diǎn)：（1）從字符串中間任意一個(gè)位置查找，不像match()是從開(kāi)頭開(kāi)始查找；（2）如果查找不到則返回None；

 import re
　　string = "ddafsadadfadfafdafdadfasfdafafda"
　　m = re.search("a",string)   （1）從中間開(kāi)始匹配
　　print(m)
　　print(m.group())
　　n = re.search("N",string)   （2）匹配不到的情況
　　print(n)

運(yùn)行結(jié)果如下：

 _sre.SRE_Match object; span=(2, 3), match='a'>  （1）a             （2）None             （3）

從上面結(jié)果(1）可以看出，search(pattern,string,flag=0)可以從中間任意一個(gè)位置匹配，擴(kuò)大了使用范圍，不像match()只能從開(kāi)頭匹配，并且匹配到了返回的也是一個(gè)match_object對(duì)象；（2）要想展示一個(gè)match_object對(duì)象，那么需要使用group()方法；（3）如果查找不到，則返回一個(gè)None。

（4）sub(pattern,repl,string,count=0,flags=0)

def sub(pattern, repl, string, count=0, flags=0):
　　　　"""Return the string obtained by replacing the leftmost
　　　　non-overlapping occurrences of the pattern in string by the
　　　　replacement repl. repl can be either a string or a callable;
　　　　if a string, backslash escapes in it are processed. If it is
　　　　a callable, it's passed the match object and must return
　　　　a replacement string to be used."""
　　　　return _compile(pattern, flags).sub(repl, string, count)
 sub(pattern,repl,string,count=0,flags=0)查找替換，就是先查找pattern是否在字符串string中；repl是要把pattern匹配的對(duì)象，就要把正則表達(dá)式找到的字符替換為什么；count可以指定匹配個(gè)數(shù)，匹配多少個(gè)。示例如下：
 import re
　　string = "ddafsadadfadfafdafdadfasfdafafda"
　　m = re.sub("a","A",string) #不指定替換個(gè)數(shù)（1）
　　print(m)
　　n = re.sub("a","A",string,2) #指定替換個(gè)數(shù)（2）
　　print(n)
　　l = re.sub("F","B",string) #匹配不到的情況（3）
　　print(l)

運(yùn)行結(jié)果如下：

    ddAfsAdAdfAdfAfdAfdAdfAsfdAfAfdA        --（1）
　　ddAfsAdadfadfafdafdadfasfdafafda        -- (2）
　　ddafsadadfadfafdafdadfasfdafafda        --（3）

上面代碼（1）是沒(méi)有指定匹配的個(gè)數(shù)，那么默認(rèn)是把所有的都匹配了；（2）處指定了匹配的個(gè)數(shù)，那么只匹配指定個(gè)數(shù)的；（3）處要匹配的正則pattern不在字符串中，則返回原來(lái)的字符串。

重點(diǎn)：（1）可以指定匹配個(gè)數(shù)，不指定匹配所有；（2）如果匹配不到會(huì)返回原來(lái)的字符串；

（5）subn(pattern,repl,string,count=0,flags=0)

def subn(pattern, repl, string, count=0, flags=0):
　　　　"""Return a 2-tuple containing (new_string, number).
　　　　new_string is the string obtained by replacing the leftmost
　　　　non-overlapping occurrences of the pattern in the source
　　　　string by the replacement repl. number is the number of
　　　　substitutions that were made. repl can be either a string or a
　　　　callable; if a string, backslash escapes in it are processed.
　　　　If it is a callable, it's passed the match object and must
　　　　return a replacement string to be used."""
　　　　return _compile(pattern, flags).subn(repl, string, count)

上面注釋Return a 2-tuple containing(new_string,number):返回一個(gè)元組,用于存放正則匹配之后的新的字符串和匹配的個(gè)數(shù)(new_string,number)。

 import re
　　string = "ddafsadadfadfafdafdadfasfdafafda"
　　m = re.subn("a","A",string) #全部替換的情況 （1）
　　print(m)
　　n = re.subn("a","A",string,3) #替換部分 （2）
　　print(n)
　　l = re.subn("F","A",string) #指定替換的字符串不存在 （3）
　　print(l)

運(yùn)行結(jié)果如下：

    ('ddAfsAdAdfAdfAfdAfdAdfAsfdAfAfdA', 11)     （1）
　　('ddAfsAdAdfadfafdafdadfasfdafafda', 3)      （2）
　　('ddafsadadfadfafdafdadfasfdafafda', 0)       （3）

從上面代碼輸出的結(jié)果可以看出，sub()和subn(pattern,repl,string,count=0,flags=0)可以看出，兩者匹配的效果是一樣的，只是返回的結(jié)果不同而已，sub()返回的還是一個(gè)字符串，而subn()返回的是一個(gè)元組，用于存放正則之后新的字符串，和替換的個(gè)數(shù)。

（6）split(pattern,string,maxsplit=0,flags=0)

 def split(pattern, string, maxsplit=0, flags=0):
　　　　"""Split the source string by the occurrences of the pattern,
　　　　returning a list containing the resulting substrings. If
　　　　capturing parentheses are used in pattern, then the text of all
　　　　groups in the pattern are also returned as part of the resulting
　　　　list. If maxsplit is nonzero, at most maxsplit splits occur,
　　　　and the remainder of the string is returned as the final element
　　　　of the list."""
　　　　return _compile(pattern, flags).split(string, maxsplit) 
 split(pattern,string,maxsplit=0,flags=0)是字符串的分割，按照某個(gè)正則要求pattern分割字符串，返回一個(gè)列表returning a list containing the resulting substrings.就是按照某種方式分割字符串，并把字符串放在一個(gè)列表中。實(shí)例如下：
 import re
　　string = "ddafsadadfadfafdafdadfasfdafafda"
　　m = re.split("a",string) #分割字符串（1）
　　print(m)
　　n = re.split("a",string,3) #指定分割次數(shù)
　　print(n)
　　l = re.split("F",string) #分割字符串不存在列表中
　　print(l)

運(yùn)行結(jié)果如下：

 ['dd', 'fs', 'd', 'df', 'df', 'fd', 'fd', 'df', 'sfd', 'f', 'fd', '']  （1）
['dd', 'fs', 'd', 'dfadfafdafdadfasfdafafda']        （2）
['ddafsadadfadfafdafdadfasfdafafda']          （3）

從（1）處可以看出，如果字符串開(kāi)頭或者結(jié)尾包括要分割的字符串，后面元素會(huì)是一個(gè)""；（2）處我們可以指定要分割的次數(shù)；（3）處如果要分割的字符串不存在列表中，則把原字符串放在列表中。

（7）findall(pattern,string,flags=)

def findall(pattern, string, flags=0):
　　　　"""Return a list of all non-overlapping matches in the string.
　　　　If one or more capturing groups are present in the pattern, return
　　　　a list of groups; this will be a list of tuples if the pattern
　　　　has more than one group.
　　　　Empty matches are included in the result."""
　　　　return _compile(pattern, flags).findall(string)
 findall(pattern,string,flags=)是返回一個(gè)列表，包含所有匹配的元素。存放在一個(gè)列表中。示例如下：
 import re
　　string = "dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda"  
　　m = re.findall("[a-z]",string)  #匹配字母，匹配所有的字母，返回一個(gè)列表（1）
　　print(m)
　　n = re.findall("[0-9]",string)  #匹配所有的數(shù)字，返回一個(gè)列表   （2）
　　print(n)
　　l = re.findall("[ABC]",string)  #匹配不到的情況      （3）
　　print(l)

運(yùn)行結(jié)果如下：

 ['d', 'd', 'a', 'd', 'f', 'a', 'd', 'f', 'a', 'f', 'd', 'a', 'f', 'd', 'a', 'd', 'f', 'a', 's', 'f', 'd', 'a', 'f', 'a', 'f', 　 'd', 'a']  （1）
　　['1', '2', '3', '2', '4', '6', '4', '6', '5', '1', '6', '4', '8', '1', '5', '6', '4', '1', '2', '7', '1', '1', '3', '0', '0', 　 '2', '5', '8']  （2）
 []     （3）

上面代碼運(yùn)行結(jié)果（1）處匹配了所有的字符串，單個(gè)匹配；（2)處匹配了字符串中的數(shù)字，返回到一個(gè)列表中；（3）處匹配不存在的情況，返回一個(gè)空列表。

重點(diǎn)：（1）匹配不到的時(shí)候返回一個(gè)空的列表；（2）如果沒(méi)有指定匹配次數(shù)，則只單個(gè)匹配。

（8）finditer(pattern,string,flags=0)

def finditer(pattern, string, flags=0):
　　　　"""Return an iterator over all non-overlapping matches in the
　　　　string. For each match, the iterator returns a match object.
　　　　Empty matches are included in the result."""
　　　　return _compile(pattern, flags).finditer(string)
 finditer(pattern,string)查找模式，Return an iterator over all non-overlapping matches in the string.For each match,the iterator a match object.

代碼如下：

 import re
　　string = "dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda"
　　m = re.finditer("[a-z]",string)
　　print(m)
　　n = re.finditer("AB",string)
　　print(n)

運(yùn)行結(jié)果如下：

callable_iterator object at 0x7fa126441898>   （1）
　　callable_iterator object at 0x7fa124d6b710>   （2）

從上面運(yùn)行結(jié)果可以看出，finditer(pattern,string,flags=0)返回的是一個(gè)iterator對(duì)象。

（9）compile(pattern,flags=0)

 def compile(pattern, flags=0):
　　　　"Compile a regular expression pattern, returning a pattern object."
　　　　return _compile(pattern, flags)

（10）pruge()

 def purge():
　　　　"Clear the regular expression caches"
　　　　_cache.clear()
　　　　_cache_repl.clear()

（11）template(pattern,flags=0)

def template(pattern, flags=0):
　　　　"Compile a template pattern, returning a pattern object"
　　　　return _compile(pattern, flags|T)

正則表達(dá)式：

語(yǔ)法：

　import re
　　string = "dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda"
　　p = re.compile("[a-z]+")  #先使用compile(pattern)進(jìn)行編譯
　　m = p.match(string)   #然后進(jìn)行匹配
　　print(m.group())

上面的第2 和第3行也可以合并成一行來(lái)寫(xiě)：

 m = p.match("^[0-9]",'14534Abc')

效果是一樣的，區(qū)別在于，第一種方式是提前對(duì)要匹配的格式進(jìn)行了編譯（對(duì)匹配公式進(jìn)行解析），這樣再去匹配的時(shí)候就不用在編譯匹配的格式，第2種簡(jiǎn)寫(xiě)是每次匹配的時(shí)候都要進(jìn)行一次匹配公式的編譯，所以，如果你需要從一個(gè)5w行的文件中匹配出所有以數(shù)字開(kāi)頭的行，建議先把正則公式進(jìn)行編譯再匹配，這樣速度會(huì)快點(diǎn)。

匹配的格式：

（1）^ 匹配字符串的開(kāi)頭

 import re
　　string = "dd12a32d41648f27fd11a0sfdda"
　　#^匹配字符串的開(kāi)頭，現(xiàn)在我們使用search()來(lái)匹配以數(shù)字開(kāi)始的
　　m = re.search("^[0-9]",string) #匹配字符串開(kāi)頭以數(shù)字開(kāi)始  （1）
　　print(m)
　　n = re.search("^[a-z]+",string) #匹配字符串開(kāi)頭以字母開(kāi)始，如果是從開(kāi)頭匹配，就與search()沒(méi)有太多的區(qū)別了 （2）
　　print(n.group())

運(yùn)行結(jié)果如下：

None
　　dd

在上面（1）處我們使用^從字符串開(kāi)頭開(kāi)始匹配，匹配開(kāi)始是否是數(shù)字，由于字符串前面是字母，不是數(shù)字，所以匹配失敗，返回None；（2）處我們以字母開(kāi)始匹配，由于開(kāi)頭是字母，匹配正確，返回正確的結(jié)果；這樣看，其實(shí)^類似于match()從開(kāi)頭開(kāi)始匹配。

（2）$ 匹配字符串的末尾

import re
　　string = "15111252598"
　　#^匹配字符串的開(kāi)頭，現(xiàn)在我們使用search()來(lái)匹配以數(shù)字開(kāi)始的
　　m = re.match("^[0-9]{11}$",string)
　　print(m.group())

運(yùn)行結(jié)果如下：

15111252598

re.match("^[0-9]{11}$",string)含義是匹配以數(shù)字開(kāi)頭，長(zhǎng)度為11，結(jié)尾為數(shù)字的格式；

（3）點(diǎn)（·）匹配任意字符，除了換行符。當(dāng)re.DoTALL標(biāo)記被指定時(shí)，則可以匹配包括換行符的任意字符

 import re
　　string = "1511\n1252598"
　　#點(diǎn)（·）是匹配除了換行符以外所有的字符
　　m = re.match(".",string) #點(diǎn)(·)是匹配任意字符，沒(méi)有指定個(gè)數(shù)就匹配單個(gè)  （1）
　　print(m.group())
　　n = re.match(".+",string) #.+是匹配多個(gè)任意字符，除了換行符    （2）
　　print(n.group())

運(yùn)行結(jié)果如下：

1
　　1511

從上面代碼運(yùn)行結(jié)果可以看出，（1）處點(diǎn)（·）是匹配任意字符；（2）處我們匹配任意多個(gè)字符，但是由于字符串中間包含了空格，結(jié)果就只匹配了字符串中換行符前面的內(nèi)容，后面的內(nèi)容沒(méi)有匹配。

重點(diǎn)：（1）點(diǎn)（·）匹配除了換行符之外任意字符；（2）.+可以匹配多個(gè)任意除了換行符的字符。

（4）[...] 如[abc]匹配"a","b"或"c"

[object]匹配括號(hào)中的包含的字符。[A-Za-z0-9]表示匹配A-Z或a-z或0-9。

 import re
　　string = "1511\n125dadfadf2598"
　　#[]匹配包含括號(hào)中的字符
　　m = re.findall("[5fd]",string) #匹配字符串中的5,f,d
　　print(m)

運(yùn)行結(jié)果如下：

['5', '5', 'd', 'd', 'f', 'd', 'f', '5']

上面代碼，我們是要匹配字符串中的5,f,d并返回一個(gè)列表。

（5）[^...] [^abc]匹配除了abc之外的任意字符

 import re
　　string = "1511\n125dadfadf2598"
　　#[^]匹配包含括號(hào)中的字符
　　m = re.findall("[^5fd]",string) #匹配字符串除5,f,d之外的字符
　　print(m)

運(yùn)行如下：

['1', '1', '1', '\n', '1', '2', 'a', 'a', '2', '9', '8']

上面代碼，我們匹配除了5,f,d之外的字符，[^]是匹配非中括號(hào)內(nèi)字符之外的字符。

（6）* 匹配0個(gè)或多個(gè)的表達(dá)式

 import re
　　string = "1511\n125dadfadf2598"
　　#*是匹配0個(gè)或多個(gè)的表達(dá)式
　　m = re.findall("\d*",string) #匹配0個(gè)或多個(gè)數(shù)字
　　print(m)

運(yùn)行結(jié)果如下：

['1511', '', '125', '', '', '', '', '', '', '', '2598', '']

從上面運(yùn)行結(jié)果可以看出(*)是匹配0個(gè)或多個(gè)字符的表達(dá)式，我們匹配的是0個(gè)或多個(gè)數(shù)字，可以看出，如果匹配不到返回的是空，并且最后位置哪里返回的是一個(gè)空("")。

（7）+ 匹配1個(gè)或多個(gè)的表達(dá)式

 import re
　　string = "1511\n125dadfadf2598"
　　#（+）是匹配1個(gè)或多個(gè)的表達(dá)式
　　m = re.findall("\d+",string) #匹配1個(gè)或多個(gè)數(shù)字
　　print(m)

運(yùn)行如下：

['1511', '125', '2598']

加（＋）是匹配1個(gè)或多個(gè)表達(dá)式，上面\d+是匹配1個(gè)或多個(gè)數(shù)字表達(dá)式，至少匹配一個(gè)數(shù)字。

（8）? 匹配0個(gè)或1個(gè)的表達(dá)式，非貪婪方式

 import re
　　string = "1511\n125dadfadf2598"
　　#（?）是匹配0個(gè)或1個(gè)的表達(dá)式
　　m = re.findall("\d?",string) #匹配0個(gè)或1個(gè)的表達(dá)式
　　print(m)

　運(yùn)行結(jié)果如下：

['1', '5', '1', '1', '', '1', '2', '5', '', '', '', '', '', '', '', '2', '5', '9', '8', '']

上面問(wèn)號(hào)（？）是匹配0個(gè)或1個(gè)表達(dá)式，上面是匹配0個(gè)或1個(gè)的表達(dá)式，如果匹配不到則返回空("")

（9）{n} 匹配n次，定義一個(gè)字符串匹配的次數(shù)

（10）{n,m} 匹配n到m次表達(dá)式

（11）\w 匹配字母數(shù)字

\w是匹配字符串中的字母和數(shù)字，代碼如下：

 import re
　　string = "1511\n125dadfadf2598"
　　#（?）是匹配0個(gè)或1個(gè)的表達(dá)式
　　m = re.findall("\w",string) #匹配0個(gè)或1個(gè)的表達(dá)式
　　print(m)

運(yùn)行如下：

['1', '5', '1', '1', '1', '2', '5', 'd', 'a', 'd', 'f', 'a', 'd', 'f', '2', '5', '9', '8']

從上面代碼可以看出，\w是用來(lái)匹配字符串中的字母數(shù)字的。我們使用正則匹配字母和數(shù)字。

（12）\W \W大寫(xiě)的W是用來(lái)匹配非字母和數(shù)字的，與小寫(xiě)w正好相反

實(shí)例如下：

 import re
　　string = "1511\n125dadfadf2598"
　　#\W用來(lái)匹配字符串中的非字母和數(shù)字
　　m = re.findall("\W",string) #\W用來(lái)匹配字符串中的非字母和數(shù)字
　　print(m)

運(yùn)行如下：

['\n']

上面代碼中，\W是用來(lái)匹配非字母和數(shù)字的，結(jié)果把換行符匹配出來(lái)了。

（13）\s 匹配任意空白字符，等價(jià)于[\n\t\f]

實(shí)例如下：

 import re
　　string = "1511\n125d\ta\rdf\fadf2598"
　　#\s是用來(lái)匹配字符串中的任意空白字符，等價(jià)于[\n\t\r\f]
　　m = re.findall("\s",string) #\s用來(lái)匹配字符串中任意空白字符
　　print(m)

　運(yùn)行如下：

['\n', '\t', '\r', '\x0c']

從上面代碼運(yùn)行結(jié)果可以看出:\s是用來(lái)匹配任意空的字符，我們把空的字符匹配出來(lái)了

（14）\S 匹配任意非空字符

實(shí)例如下：

 import re
　　string = "1511\n125d\ta\rdf\fadf2598"
　　#\S是用來(lái)匹配任意非空字符
　　m = re.findall("\S",string) #\S用來(lái)匹配日任意非空字符
　　print(m)

　　運(yùn)行如下：

['1', '5', '1', '1', '1', '2', '5', 'd', 'a', 'd', 'f', 'a', 'd', 'f', '2', '5', '9', '8']

從上面代碼可以看出，\S是用來(lái)匹配任意非空字符，結(jié)果中，我們匹配了任意非空的字符。

（15）\d 匹配任意數(shù)字，等價(jià)于[0-9]

（16）\D 匹配任意非數(shù)字

總結(jié)：findall()，split()生成的都是列表，一個(gè)是以某個(gè)為分隔符，一個(gè)是以查找中所有的值。正好相反。

您可能感興趣的文章:

Java基于正則表達(dá)式實(shí)現(xiàn)查找匹配的文本功能【經(jīng)典實(shí)例】
淺析正則表達(dá)式中的lastIndex以及預(yù)查
iOS 正則表達(dá)式判斷手機(jī)號(hào)碼、固話
15/18位身份證號(hào)碼驗(yàn)證的正則表達(dá)式總結(jié)（詳細(xì)版）

標(biāo)簽：廣元大理江蘇萍鄉(xiāng) 棗莊蚌埠衢州衡水

巨人網(wǎng)絡(luò)通訊聲明：本文標(biāo)題《正則表達(dá)式（regular）知識(shí)(整理)》，本文關(guān)鍵詞；如發(fā)現(xiàn)本文內(nèi)容存在版權(quán)問(wèn)題，煩請(qǐng)?zhí)峁┫嚓P(guān)信息告之我們，我們將及時(shí)溝通與處理。本站內(nèi)容系統(tǒng)采集于網(wǎng)絡(luò)，涉及言論、版權(quán)與本站無(wú)關(guān)。