Python's gc module gives you control over the garbage collector. Most of the time you don't need it—Python handles memory automatically. But for debugging memory leaks or optimizing performance-critical code, it's invaluable.

How Python Memory Works

Python uses two mechanisms:

  1. Reference counting (automatic): Objects are freed when their reference count hits zero
  2. Garbage collection (gc module): Catches circular references that reference counting misses
# Reference counting handles this
x = [1, 2, 3]
x = None  # List is freed immediately
 
# GC handles this
a = []
b = []
a.append(b)
b.append(a)  # Circular reference!
a = b = None  # Reference count > 0, but unreachable

Basic gc Operations

import gc
 
# Force garbage collection
gc.collect()
 
# Check if GC is enabled
print(gc.isenabled())  # True
 
# Disable (rarely needed)
gc.disable()
gc.enable()

Finding Memory Leaks

import gc
 
# Get all objects tracked by GC
all_objects = gc.get_objects()
print(f"Tracked objects: {len(all_objects)}")
 
# Find objects of a specific type
lists = [obj for obj in gc.get_objects() if isinstance(obj, list)]
print(f"Lists in memory: {len(lists)}")
 
# Get unreachable objects
gc.collect()
garbage = gc.garbage  # Objects that couldn't be freed
print(f"Uncollectable: {len(garbage)}")

GC Statistics

import gc
 
# Collection counts by generation
print(gc.get_count())  # (gen0_count, gen1_count, gen2_count)
 
# Collection thresholds
print(gc.get_threshold())  # (700, 10, 10) default
 
# Stats from last collection
gc.collect()
print(gc.get_stats())

Generational Collection

Python uses three generations:

  • Gen 0: New objects, collected frequently
  • Gen 1: Survived one collection
  • Gen 2: Long-lived objects, collected rarely
import gc
 
# Set thresholds (gen0, gen1, gen2)
gc.set_threshold(1000, 15, 15)
 
# Collect specific generation
gc.collect(0)  # Gen 0 only
gc.collect(1)  # Gen 0 and 1
gc.collect(2)  # All generations (same as gc.collect())

Debug Flags

import gc
 
# Enable debug output
gc.set_debug(gc.DEBUG_STATS)       # Collection statistics
gc.set_debug(gc.DEBUG_COLLECTABLE) # Show collectable objects
gc.set_debug(gc.DEBUG_UNCOLLECTABLE)  # Show uncollectable
gc.set_debug(gc.DEBUG_SAVEALL)     # Save objects to gc.garbage
 
# Combine flags
gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_LEAK)
 
# Turn off
gc.set_debug(0)

Reference Tracking

import gc
 
class MyClass:
    pass
 
obj = MyClass()
 
# What references this object?
gc.collect()
referrers = gc.get_referrers(obj)
print(f"Referrers: {len(referrers)}")
 
# What does this object reference?
obj.data = [1, 2, 3]
referents = gc.get_referents(obj)
print(f"Referents: {referents}")

Breaking Circular References

import gc
import weakref
 
# Problem: circular reference
class Node:
    def __init__(self):
        self.parent = None
        self.children = []
    
    def add_child(self, child):
        self.children.append(child)
        child.parent = self  # Circular!
 
# Solution: use weakref for back-reference
class BetterNode:
    def __init__(self):
        self._parent = None
        self.children = []
    
    @property
    def parent(self):
        return self._parent() if self._parent else None
    
    def add_child(self, child):
        self.children.append(child)
        child._parent = weakref.ref(self)  # No cycle

Callbacks on Collection

import gc
 
def on_collect(phase, info):
    if phase == "start":
        print(f"GC starting: gen {info['generation']}")
    else:
        print(f"GC done: collected {info['collected']}")
 
gc.callbacks.append(on_collect)
 
# Trigger
gc.collect()

Performance Optimization

For tight loops, disable GC temporarily:

import gc
 
gc.disable()
try:
    # Performance-critical code
    for i in range(1_000_000):
        process(i)
finally:
    gc.enable()
    gc.collect()

Memory Profiling Pattern

import gc
import sys
 
def memory_check():
    gc.collect()
    objects = gc.get_objects()
    
    # Count by type
    type_counts = {}
    for obj in objects:
        t = type(obj).__name__
        type_counts[t] = type_counts.get(t, 0) + 1
    
    # Top 10
    for t, count in sorted(type_counts.items(), 
                           key=lambda x: -x[1])[:10]:
        print(f"{t}: {count}")
 
memory_check()

Common Issues

Uncollectable Objects

Objects with __del__ methods in cycles can't be collected safely:

import gc
 
class Leaky:
    def __del__(self):
        print("Cleaning up")
 
a = Leaky()
b = Leaky()
a.ref = b
b.ref = a  # Cycle with __del__
 
a = b = None
gc.collect()
print(gc.garbage)  # Contains the uncollectable objects

Fix: Avoid __del__ or use weakref.

Holding References

# Accidentally keeping references
cache = {}
 
def process(key, data):
    result = expensive_computation(data)
    cache[key] = result  # Never cleared!
    return result
 
# Better: use WeakValueDictionary
import weakref
cache = weakref.WeakValueDictionary()

When to Use gc

  • Debugging memory leaks: gc.get_objects(), gc.get_referrers()
  • Performance tuning: Adjust thresholds or disable temporarily
  • Forcing cleanup: gc.collect() before memory-sensitive operations
  • Long-running services: Monitor gc.get_stats()

Summary

Python's GC handles circular references automatically. The gc module lets you:

  • Force collection with collect()
  • Debug with get_objects(), get_referrers()
  • Tune with set_threshold()
  • Monitor with get_stats()

For most code, you don't need to touch it. But when hunting memory leaks, it's essential.

React to this post: