Generators let you iterate over data without loading it all into memory. Here's how they work.

The Problem

# Loads entire file into memory
lines = open("huge_file.txt").readlines()
for line in lines:
    process(line)
 
# Memory efficient - one line at a time
for line in open("huge_file.txt"):
    process(line)

The second approach uses a generator internally.

Generator Functions

Use yield instead of return:

def count_up_to(n):
    i = 0
    while i < n:
        yield i
        i += 1
 
# Creates generator object (no computation yet)
gen = count_up_to(5)
 
# Values computed on demand
for num in gen:
    print(num)  # 0, 1, 2, 3, 4

Each yield pauses the function, returning a value. Next iteration resumes.

Generator Expressions

Like list comprehensions, but lazy:

# List comprehension - all in memory
squares = [x**2 for x in range(1000000)]
 
# Generator expression - computed on demand
squares = (x**2 for x in range(1000000))
 
# Use parentheses, not brackets

Common Patterns

Reading large files

def read_chunks(file_path, chunk_size=8192):
    with open(file_path, "rb") as f:
        while chunk := f.read(chunk_size):
            yield chunk
 
for chunk in read_chunks("huge_file.bin"):
    process(chunk)

Infinite sequences

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b
 
# Take first 10
from itertools import islice
fibs = list(islice(fibonacci(), 10))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Transforming data

def uppercase_lines(lines):
    for line in lines:
        yield line.upper()
 
# Chain generators
lines = open("data.txt")
upper = uppercase_lines(lines)
for line in upper:
    print(line)

itertools

The standard library for iterator operations:

from itertools import (
    islice,      # Take first N items
    chain,       # Combine iterators
    cycle,       # Repeat infinitely
    repeat,      # Repeat value
    count,       # Infinite counter
    takewhile,   # Take while condition true
    dropwhile,   # Skip while condition true
    groupby,     # Group consecutive items
    filterfalse, # Filter by false condition
)
 
# Examples
list(islice(count(10), 5))        # [10, 11, 12, 13, 14]
list(chain([1, 2], [3, 4]))       # [1, 2, 3, 4]
list(takewhile(lambda x: x < 5, [1, 3, 5, 2]))  # [1, 3]

Generator Methods

def gen():
    while True:
        value = yield
        print(f"Received: {value}")
 
g = gen()
next(g)           # Start generator
g.send(10)        # Received: 10
g.send(20)        # Received: 20
g.close()         # Stop generator

yield from

Delegate to another generator:

def flatten(nested):
    for item in nested:
        if isinstance(item, list):
            yield from flatten(item)
        else:
            yield item
 
list(flatten([1, [2, [3, 4]], 5]))
# [1, 2, 3, 4, 5]

Memory Comparison

import sys
 
# List: stores all values
list_data = [x for x in range(1000000)]
print(sys.getsizeof(list_data))  # ~8 MB
 
# Generator: stores only the generator object
gen_data = (x for x in range(1000000))
print(sys.getsizeof(gen_data))   # ~200 bytes

When to Use Generators

Use generators when:

  • Processing large files
  • Infinite sequences
  • Memory is constrained
  • You only need to iterate once
  • Chaining transformations

Use lists when:

  • You need random access
  • You need to iterate multiple times
  • Data is small
  • You need list methods (append, sort, etc.)

My Patterns

# Process files line by line
def process_log(path):
    with open(path) as f:
        for line in f:
            if "ERROR" in line:
                yield parse_error(line)
 
# Chain generators for pipelines
raw_data = read_file("data.csv")
parsed = parse_rows(raw_data)
filtered = (row for row in parsed if row["valid"])
transformed = transform(filtered)

Generators are Python's way of handling data streams. Master them for memory-efficient code.

React to this post: