April 16, 2020
Regular expressions are an extremely powerful method of searching and extracting information from strings. A good, basic tutorial is available here.
In the most-common use case, you have a test string that you test against a regular expression string, using the function search
from the re
module:
import re # we need to import the re module to use it
test_string = "My name is John Doe"
# test whether test_string contains "name"
# (pay attention to the r in front of the string; we need this)
match = re.search(r"name", test_string)
if match: # did we find a match?
print("Test string matches.")
print("Match:", match.group()) # print out the part of the string that matched
else:
print("Test string doesn't match.")
test_string = "My email is john@utexas.edu"
# test whether test_string contains "name"
match = re.search(r"name", test_string)
if match: # did we find a match?
print("Test string matches.")
print("Match:", match.group())
else:
print("Test string doesn't match.")
Much of the power of regular expressions stems from the fact that you can match on general patterns. For example, \S+
will match an arbitrary number of non-whitespace characters:
test_string = "My age is secret."
match = re.search(r"My \S+ is", test_string)
print("Match:", match.group())
test_string = "My mood is good."
match = re.search(r"My \S+ is", test_string)
print("Match:", match.group())
We can also capture substrings using regular expressions, by encapsulating the parts of interest in parentheses ()
:
test_string = "My age is secret."
match = re.search(r"My (\S+) is (\S+)", test_string)
print("Match:", match.group(0))
print("Captured group 1:" , match.group(1))
print("Captured group 2:" , match.group(2))
Problem 1
Use the online python regular expression editor available here: http://pythex.org/ to explore regular expressions. For each of the given test strings, find the regular expressions that achieves the given goals.
1. Test string: "my email is: john@utexas.edu
"
my email is
" @utexas.edu
" utexas.edu
email address 2. Test string: "phone number: 123-456-7890
"
phone number:
" and capture the phone number 3. Invent a few more problems and solutions on your own.
Problem 2:
Write python code that can take a string of the form "My name is: ..."
, extract the name (indicated here by ...
), and then print it. Make sure you get the full name, not just the first name.
test_string = "My name is: John Doe"
# your code goes here
Problem 3:
Write a function that can parse phone numbers in any sort of format and print them out in the standard 123-456-7890 format.
def clean_phone_number(input):
# implement your function here
pass # delete this, it is here just as a placeholder
# all these calls should produce the number 123-456-7890
clean_phone_number("1234567890")
clean_phone_number("+1 (123) 456-7890")
clean_phone_number("1 123 456 7890")
clean_phone_number("(123) 4567890")
# the function should realize that this is not a valid phone number
clean_phone_number("123456")