#37. Python Regular Expression

#37. Python Regular Expression

By Ifeanyi Omeata


Topics:


1. Import REGEX module
2. The findall() Function
3. The search() Function
4. The split() Function
5. The sub() Function
6. The Match Object Methods
7. Find set of characters in string
8. Escape special sequence character
9. Search for any character in string
10. Check if string starts with character(s)
11. Check if string ends with character(s)
12. Zero or more character occurrences
13. One or more character occurrences
14. Zero or one character occurrences
15. Specified number of character occurrences
16. Either/or character occurrences
17. Characters are at the beginning of the string
18. Characters are at the end of the string
19. Characters are at beginning or end of the string
20. Characters are NOT at beginning or end of the string
21. Characters contain digits
22. Characters DO NOT contain digits
23. Contains white space character
24. Contains NON white space characters
25. Contains word or numeric characters
26. Contains NON word or numeric characters


1. Import REGEX module


>>Return to Menu

import re


2. The findall() Function


>>Return to Menu
The findall() function returns a list containing all matches.

import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

image.png

import re

txt = "The rain in Spain"

x = re.findall("Portugal", txt)
print(x)

image.png


3. The search() Function


>>Return to Menu
The search() function searches the string for a match, and returns a Match object if there is a match. If there is more than one match, only the first occurrence of the match will be returned.

import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print(x)
print(x.start())
print(x.end())

image.png

import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)

print(x)

image.png


4. The split() Function


>>Return to Menu
The split() function returns a list where the string has been split at each match.

import re

txt = "The rain in Spain"
x = re.split("\s", txt)

print(x)

image.png

You can control the number of occurrences by specifying the maxsplit parameter.

import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)

print(x)

image.png


5. The sub() Function


>>Return to Menu
The sub() function replaces the matches with the text of your choice.

import re

txt = "The rain in Spain"
x = re.sub("\s", "|", txt)

print(x)

image.png

You can control the number of replacements by specifying the count parameter.

import re

txt = "The rain in Spain"
x = re.sub("\s", "|", txt, 2)

print(x)

image.png


6. The Match Object Methods


>>Return to Menu

The Match object has properties and methods used to retrieve information about the search, and the result:

  • span() returns a tuple containing the start, and end positions of the match.
  • string returns the string passed into the function.
  • group() returns the part of the string where there was a match.
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)

print(x)
print(x.start())
print(x.end())
print(x.span())
print(x.string)
print(x.group())

image.png


7. Find set of characters in string


>>Return to Menu
Find all lower case characters alphabetically between "a" and "m":

import re

txt = "The rain in Spain"

x = re.findall("[a-m]", txt)
print(x)

image.png


8. Escape special sequence character


>>Return to Menu
Find all digit characters with \d:

import re

txt = "That will be 59 dollars"

x = re.findall("\d", txt)
print(x)

p = re.sub("\d","#",txt)
print(p)

image.png


9. Search for any character in string


>>Return to Menu
Search for a sequence that starts with "he", followed by two (any) characters, and an "o":

import re

txt = "hello planet"

x = re.findall("he..o", txt)
print(x)

image.png


10. Check if string starts with character(s)


>>Return to Menu
Check if the string starts with 'hello':

import re

txt = "hello planet"

#Check if the string starts with 'hello':

x = re.findall("^hello", txt)

print(x)
if x:
  print("Yes, the string starts with 'hello'")
else:
  print("No match")

image.png


11. Check if string ends with character(s)


>>Return to Menu
Check if the string ends with 'planet':

import re

txt = "hello planet"

x = re.findall("planet$", txt)

print(x)
if x:
  print("Yes, the string ends with 'planet'")
else:
  print("No match")

image.png


12. Zero or more character occurrences


>>Return to Menu
Search for a sequence that starts with "he", followed by 0 or more of (any) characters, and an "o":

import re

txt = "hello planet"

x = re.findall("he.*o", txt)

print(x)

image.png


13. One or more character occurrences


>>Return to Menu
Search for a sequence that starts with "he", followed by 1 or more of (any) characters, and an "o":

import re

txt = "hello planet"

x = re.findall("he.+o", txt)

print(x)

image.png


14. Zero or one character occurrences


>>Return to Menu
Search for a sequence that starts with "he", followed by 0 or 1 of (any) character, and an "o":

import re

txt = "hello planet"

x = re.findall("he.?o", txt)

print(x)

image.png


15. Specified number of character occurrences


>>Return to Menu
Search for a sequence that starts with "he", followed excactly 2 of (any) characters, and an "o":

import re

txt = "hello planet"

x = re.findall("he.{2}o", txt)

print(x)

image.png


16. Either/or character occurrences


>>Return to Menu
Check if the string contains either "falls" or "stays":

import re

txt = "The rain in Spain falls mainly in the plain!"

x = re.findall("falls|stays", txt)

print(x)
if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

image.png


17. Characters are at the beginning of the string


>>Return to Menu

#Check if the string starts with "The":

import re

txt = "The rain in Spain"

x = re.findall("\AThe", txt)

print(x)

if x:
  print("Yes, there is a match!")
else:
  print("No match")

image.png


18. Characters are at the end of the string


>>Return to Menu

#Check if the string ends with "Spain":

import re

txt = "The rain in Spain"

x = re.findall("Spain\Z", txt)

print(x)

if x:
  print("Yes, there is a match!")
else:
  print("No match")

image.png


19. Characters are at beginning or end of the string


>>Return to Menu

import re

txt = "The rain in Spain"

def is_match(x):
  if x:
    print(x,"Yes, there is at least one match!")
  else:
    print(x,"No match")

#Check if "ain" is present at the beginning of a WORD:
x = re.findall(r"\bain", txt)
is_match(x)

#Check if "ain" is present at the end of a WORD:
x = re.findall(r"ain\b", txt)
is_match(x)

image.png


20. Characters are NOT at beginning or end of the string


>>Return to Menu

import re

txt = "The rain in Spain"

def is_match(x):
  if x:
    print(x,"Yes, there is at least one match!")
  else:
    print(x,"No match")

#Check if "ain" is NOT present at the beginning of a WORD:
x = re.findall(r"\Bain", txt)
is_match(x)

#Check if "ain" is NOT present at the end of a WORD:
x = re.findall(r"ain\B", txt)
is_match(x)

image.png


21. Characters contain digits


>>Return to Menu
Check if the string contains any digits (numbers from 0-9):

import re

txt = "The rain in Spain"

x = re.findall("\d", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

image.png


22. Characters DO NOT contain digits


>>Return to Menu
Return a match at every no-digit character:

import re

txt = "The rain in Spain"

x = re.findall("\D", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

image.png


23. Contains white space character


>>Return to Menu
Return a match at every white-space character:

import re

txt = "The rain in Spain"

x = re.findall("\s", txt)

print(x)
if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

image.png


24. Contains NON white space characters


>>Return to Menu

import re

txt = "The rain in Spain"

#Return a match at every NON white-space character:

x = re.findall("\S", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

image.png


25. Contains word or numeric characters


>>Return to Menu
Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character).

import re

txt = "The rain in Spain"

x = re.findall("\w", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

image.png


26. Contains NON word or numeric characters


>>Return to Menu
Return a match at every NON word character (characters NOT between a and Z. Like "!", "?" white-space etc.):

import re

txt = "The rain in Spain"

x = re.findall("\W", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

image.png

#End


Hope you enjoyed this! :) Follow me for more contents...


Get in Touch:
ifeanyiomeata.com

Youtube: youtube.com/c/IfeanyiOmeata
Linkedin: linkedin.com/in/omeatai
Twitter: twitter.com/iomeata
Github: github.com/omeatai
Stackoverflow: stackoverflow.com/users/2689166/omeatai
Hashnode: hashnode.com/@omeatai
Medium: medium.com/@omeatai
© 2022