Regular expressions are powerful but often overused. Here's when and how to use them.

When to Use Regex

Use regex for:

  • Complex pattern matching
  • Extracting multiple groups
  • Validation with specific formats
  • Find/replace with patterns

Use string methods for:

  • Simple checks (startswith, endswith, in)
  • Basic splits and joins
  • Case changes
  • Strip/trim
# Don't use regex
if re.match(r"^https://", url):  # Overkill
 
# Use string methods
if url.startswith("https://"):  # Better

Basic Usage

import re
 
# Search anywhere in string
match = re.search(r"error", "An error occurred")
if match:
    print(match.group())  # "error"
 
# Match at start
match = re.match(r"Hello", "Hello world")
 
# Find all matches
matches = re.findall(r"\d+", "1 apple, 2 oranges, 3 bananas")
# ['1', '2', '3']
 
# Replace
result = re.sub(r"\d+", "X", "1 apple, 2 oranges")
# "X apple, X oranges"

Common Patterns

# Digits
r"\d"       # Single digit
r"\d+"      # One or more digits
r"\d{3}"    # Exactly 3 digits
r"\d{2,4}"  # 2 to 4 digits
 
# Word characters
r"\w"       # Letter, digit, or underscore
r"\w+"      # One or more word characters
 
# Whitespace
r"\s"       # Any whitespace
r"\s+"      # One or more whitespace
 
# Anchors
r"^start"   # Start of string
r"end$"     # End of string
r"\bword\b" # Word boundary
 
# Character classes
r"[abc]"    # a, b, or c
r"[a-z]"    # Lowercase letter
r"[^abc]"   # Not a, b, or c
 
# Quantifiers
r"a?"       # Zero or one
r"a*"       # Zero or more
r"a+"       # One or more
r"a{3}"     # Exactly 3
r"a{2,5}"   # 2 to 5

Groups

# Capturing groups
match = re.search(r"(\d+)-(\d+)", "Phone: 123-4567")
if match:
    print(match.group(0))  # "123-4567" (full match)
    print(match.group(1))  # "123"
    print(match.group(2))  # "4567"
    print(match.groups())  # ('123', '4567')
 
# Named groups
match = re.search(r"(?P<area>\d+)-(?P<number>\d+)", "123-4567")
print(match.group("area"))  # "123"

Compiled Patterns

Compile for reuse:

# Compile once
pattern = re.compile(r"\d+")
 
# Use many times
pattern.search("abc123")
pattern.findall("1, 2, 3")
pattern.sub("X", "abc123")

Better performance when using the same pattern repeatedly.

Flags

# Case insensitive
re.search(r"hello", "HELLO", re.IGNORECASE)
re.search(r"(?i)hello", "HELLO")
 
# Multiline (^ and $ match line boundaries)
re.findall(r"^\w+", "line1\nline2", re.MULTILINE)
 
# Dot matches newline
re.search(r"a.b", "a\nb", re.DOTALL)
 
# Verbose (allows comments)
pattern = re.compile(r"""
    \d{3}   # Area code
    -       # Separator
    \d{4}   # Number
""", re.VERBOSE)

Common Recipes

Email (simple)

email_pattern = r"[\w.+-]+@[\w-]+\.[\w.-]+"
re.match(email_pattern, "user@example.com")

URL

url_pattern = r"https?://[\w.-]+(?:/[\w./-]*)?"

Phone number

phone_pattern = r"\d{3}[-.\s]?\d{3}[-.\s]?\d{4}"

Extract data

log_line = "2024-03-21 10:30:45 ERROR: Connection failed"
pattern = r"(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+): (.+)"
match = re.match(pattern, log_line)
date, time, level, message = match.groups()

Gotchas

Raw strings

# Wrong - \b is backspace
re.search("\bword\b", text)
 
# Right - raw string
re.search(r"\bword\b", text)

Always use r"..." for regex patterns.

Greedy vs lazy

text = "<tag>content</tag>"
 
# Greedy (default) - matches as much as possible
re.search(r"<.*>", text).group()  # "<tag>content</tag>"
 
# Lazy - matches as little as possible
re.search(r"<.*?>", text).group()  # "<tag>"

My Rules

  1. Try string methods first — simpler is better
  2. Use raw strings — always r"pattern"
  3. Compile if reused — for performance
  4. Comment complex patterns — use re.VERBOSE
  5. Test thoroughly — edge cases matter

Regex is a tool. Use it when appropriate, not everywhere.

React to this post: