5-5. Matching

Regular expressions are a powerful way to specify patterns for matching strings. In Python, the built-in re module provides methods for working with regular expressions. Let's break this down with an example and some explanations.

RegEx functions

  • findall(): Returns a list containing all matches
  • search() : Returns a Match object if there is a match anywhere in the string
  • split(): Returns a list where the string has been split at each match
  • sub(): Replaces one or many matches with a string

Metacharacters

Character Description Example
[] A set of characters "[a-m]"
\ Signals a special sequence (can also be used to escape special characters) "\d"
. Any character (except newline character) "he..o"
^ Starts with "^hello"
$ Ends with "planet$"
* Zero or more occurences "he.*o"
+ One or more occurences "he.+o"
? Zero or one occurences "he.?o"
{} Exactly the specified number of occurences "he.{2}o"
| Either or "falls|stays"
() Capture and group  

 

Special Sequences

Character Description Example
\A Returns a match if the specified characters are at the beginning of the string  "\AThe"
\b

Returns a match where the specified characters are at the beginning or at the end of the word (the “r” in the beginning is making sure that the string is being treated as a “raw string” 

r"\bain"

r"ain\b"

\B

Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word 

r"\Bain"

r"ain\B

\d Returns a match where the string contains digits (numbers from 0-9) "\d"
\D Returns a match where the string DOES NOT contain digits "\D"
\s Returns a match where the string contains a white space character  "\s"
\S Returns a match where the string DOES NOT contain a white space character "\S"
\w

Returns a match where the string contains any word characters from a to Z, digits from 0-9, “\w” and the underscore _ character

"\w"
\W Returns a match where the string DOES NOT contain any word characters  "\W"
\Z Returns a match if the specified characters are at the end of the string  "Spain\Z"

 

Sets

Set Description
[arn] Returns a match where one of the specified characters ( a, r, or n) is present
[a-n]

Returns a match for any lower-case character, alphabetically between a and n

[^arn]

Returns a match for any character EXCEPT a, r, and n

[0123] Returns a match where any of the specified digits (0, 1, 2, 3) are present
[0-9]

Returns a match for any digit between 0 and 9

[0-5][0-9] Returns a match for any two-digit numbers from 00 and 59
[a-zA-Z]

Returns a match for any character alphabetically between and and z, lowercase OR uppercase

[+] 

In sets, +, *, ., |, (), $, has no special meaning, so [+] means: return a match for any + character in the string

Examples

Example in Python Interactive Mode

Here's an example of how to use the re.match method to find matches in strings:

>>>import re

# Get help information about re.match
>>>help(re.match)

# Use re.match to find a match for the pattern ".*bob.*" in the string "bob and alice"
>>>match = re.match(".*bob.*", "bob and alice")
>>>print(match)

 

Output will be:

<re.Match object; span=(0, 13), match='bob and alice'>

 

Explanation:

  • Importing the Module: First, we import the re module, which contains all the functions needed for regular expressions.
  • Getting Help: Using help(re.match), we can see detailed information about the re.match method.
  • Matching: The re.match method checks if the pattern in the first parameter matches the string in the second parameter. If it does, it returns a Match object.

In our example, the pattern ".*bob.*" matches the string "bob and alice". This is because the pattern specifies that the string can contain "bob" with any characters before and after it.

Special Characters in Regular Expressions

Regular expressions use special characters to define patterns efficiently:

  • .*: This means "0 or more instances of any single character."
  • .: This means "any single character."

These special characters allow us to create complex patterns without writing lengthy logic.

Practical Use

Regular expressions are widely used in programming for tasks like:

  • Searching for specific patterns in text.
  • Validating input data (like email addresses or phone numbers).
  • Replacing parts of strings.