Selected topic
Text Processing
Prefer practical output? Use related tools below while reading.
====================================================
Regular expressions, also known as regex or regexp, are a powerful tool used to match patterns in text. In this summary, we'll cover the basics of regular expressions and provide examples using Python's built-in re module.
A regular expression is a pattern that describes a set of strings. It's a way to describe the structure of text data, including characters, sequences, and patterns.
[abc] matches any of a, b, or ca matches zero or more as
ab matches the sequence a followed by b(a|b) matches either a or b\w: word character (equivalent to [a-zA-Z0-9_])\d: digit\s: whitespace character\D: non-digit character\S: non-whitespace characterpython
import rephone_number = "123-456-7890"
pattern = r"\d{3}-\d{3}-\d{4}"
if re.match(pattern, phone_number):
print("Phone number matches pattern")
else:
print("Phone number does not match pattern")
In this example, the regular expression \d{3}-\d{3}-\d{4} matches any string that contains three digits followed by a hyphen, three more digits, and another hyphen, followed by four digits.
python
import retext = "Hello world, this is an example sentence."
pattern = r"\b\we\w\b"
words = re.findall(pattern, text)
print(words) # Output: ['example', 'sentence']
In this example, the regular expression \b\we\w\b matches any word that contains the character e. The \b anchors match word boundaries.
python
import retext = "Hello world, this is an example sentence."
pattern = r"\bexample\b"
replacement = "sample"
new_text = re.sub(pattern, replacement, text)
print(new_text) # Output: "Hello world, this is a sample sentence."
In this example, the re.sub function replaces all occurrences of the pattern \bexample\b with the string "sample".
Regular expressions are a powerful tool for text processing in Python. With the basics and examples provided above, you're ready to start using regex patterns to match and manipulate text data.