# Regex: Search and Match

There are two more methods in regex: search and match, however I don’t think they are as handy as .sub or .findall. Assume we have a string s=“A8C3721D86”

# search

## Check the returned value

.search returns a re object, search will search for the whole string until the pattern fails

import re

s="A8C3721D86"

r = re.search("\d", s)

r

<re.Match object; span=(1, 2), match='8'>


# match

## Check the returned value

.match returns None, as match starts from the first element

import re

s="A8C3721D86"

r = re.match("\d", s)

type(r)

NoneType


Now, change the first element to a number

import re

s="99C3721D86"

r = re.match("\d{2}", s)

r

<re.Match object; span=(0, 2), match='99'>


Now check the returned object, .group() returns the matched value

r.group()

'99'


.span() returns the position of the matched pattern

r.span()

(0, 2)


# match only once

Compared to .findall and .sub, .search and .match will only search once

# group

Now let’s check .group method, while scraping from webpages, there are html tags in the data, say I have scrapped a string s=”Life is short, I use python”, I want to get the contents between tag

import re

s = "<span>Life is short, I use python</span>"

r = re.search("<span></span>", s)

type(r)

NoneType


It returns a None type as the pattern was no good. Now recall we used . to match anything except for \n

r = re.search("<span>.*</span>", s)

r.group()

'<span>Life is short, I use python</span>'


It returns the whole string. We can use (.*) to group all the contents between

r = re.search("<span>(.*)</span>", s)

r.group()

'<span>Life is short, I use python</span>'


group’s default argument is 0, which returns the whole matching string. In this case, we should use 1 to get our desired result

r.group(1)

'Life is short, I use python'


This is the result I want! How about find all?

r = re.findall("<span>(.*)</span>", s)
r

['Life is short, I use python']


It returns exactly what I want in an array.

Now let’s add one more group.

s = "<span>Life is short, I use python</span> So <h3>This is Python</h3>"
r = re.search("<span>(.*)</span>(.*)<h3>(.*)</h3>", s)
r.group(0)

'<span>Life is short, I use python</span> So <h3>This is Python</h3>'

r.group(1)

'Life is short, I use python'

r.group(2)

' So '

r.group(3)

'This is Python'


.groups is a better method to return all groups in a tuple

r.groups()

('Life is short, I use python', ' So ', 'This is Python')