Python's gc module gives you control over the garbage collector. Most of the time you don't need it—Python handles memory automatically. But for debugging memory leaks or optimizing performance-critical code, it's invaluable.
How Python Memory Works
Python uses two mechanisms:
- Reference counting (automatic): Objects are freed when their reference count hits zero
- Garbage collection (gc module): Catches circular references that reference counting misses
# Reference counting handles this
x = [1, 2, 3]
x = None # List is freed immediately
# GC handles this
a = []
b = []
a.append(b)
b.append(a) # Circular reference!
a = b = None # Reference count > 0, but unreachableBasic gc Operations
import gc
# Force garbage collection
gc.collect()
# Check if GC is enabled
print(gc.isenabled()) # True
# Disable (rarely needed)
gc.disable()
gc.enable()Finding Memory Leaks
import gc
# Get all objects tracked by GC
all_objects = gc.get_objects()
print(f"Tracked objects: {len(all_objects)}")
# Find objects of a specific type
lists = [obj for obj in gc.get_objects() if isinstance(obj, list)]
print(f"Lists in memory: {len(lists)}")
# Get unreachable objects
gc.collect()
garbage = gc.garbage # Objects that couldn't be freed
print(f"Uncollectable: {len(garbage)}")GC Statistics
import gc
# Collection counts by generation
print(gc.get_count()) # (gen0_count, gen1_count, gen2_count)
# Collection thresholds
print(gc.get_threshold()) # (700, 10, 10) default
# Stats from last collection
gc.collect()
print(gc.get_stats())Generational Collection
Python uses three generations:
- Gen 0: New objects, collected frequently
- Gen 1: Survived one collection
- Gen 2: Long-lived objects, collected rarely
import gc
# Set thresholds (gen0, gen1, gen2)
gc.set_threshold(1000, 15, 15)
# Collect specific generation
gc.collect(0) # Gen 0 only
gc.collect(1) # Gen 0 and 1
gc.collect(2) # All generations (same as gc.collect())Debug Flags
import gc
# Enable debug output
gc.set_debug(gc.DEBUG_STATS) # Collection statistics
gc.set_debug(gc.DEBUG_COLLECTABLE) # Show collectable objects
gc.set_debug(gc.DEBUG_UNCOLLECTABLE) # Show uncollectable
gc.set_debug(gc.DEBUG_SAVEALL) # Save objects to gc.garbage
# Combine flags
gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_LEAK)
# Turn off
gc.set_debug(0)Reference Tracking
import gc
class MyClass:
pass
obj = MyClass()
# What references this object?
gc.collect()
referrers = gc.get_referrers(obj)
print(f"Referrers: {len(referrers)}")
# What does this object reference?
obj.data = [1, 2, 3]
referents = gc.get_referents(obj)
print(f"Referents: {referents}")Breaking Circular References
import gc
import weakref
# Problem: circular reference
class Node:
def __init__(self):
self.parent = None
self.children = []
def add_child(self, child):
self.children.append(child)
child.parent = self # Circular!
# Solution: use weakref for back-reference
class BetterNode:
def __init__(self):
self._parent = None
self.children = []
@property
def parent(self):
return self._parent() if self._parent else None
def add_child(self, child):
self.children.append(child)
child._parent = weakref.ref(self) # No cycleCallbacks on Collection
import gc
def on_collect(phase, info):
if phase == "start":
print(f"GC starting: gen {info['generation']}")
else:
print(f"GC done: collected {info['collected']}")
gc.callbacks.append(on_collect)
# Trigger
gc.collect()Performance Optimization
For tight loops, disable GC temporarily:
import gc
gc.disable()
try:
# Performance-critical code
for i in range(1_000_000):
process(i)
finally:
gc.enable()
gc.collect()Memory Profiling Pattern
import gc
import sys
def memory_check():
gc.collect()
objects = gc.get_objects()
# Count by type
type_counts = {}
for obj in objects:
t = type(obj).__name__
type_counts[t] = type_counts.get(t, 0) + 1
# Top 10
for t, count in sorted(type_counts.items(),
key=lambda x: -x[1])[:10]:
print(f"{t}: {count}")
memory_check()Common Issues
Uncollectable Objects
Objects with __del__ methods in cycles can't be collected safely:
import gc
class Leaky:
def __del__(self):
print("Cleaning up")
a = Leaky()
b = Leaky()
a.ref = b
b.ref = a # Cycle with __del__
a = b = None
gc.collect()
print(gc.garbage) # Contains the uncollectable objectsFix: Avoid __del__ or use weakref.
Holding References
# Accidentally keeping references
cache = {}
def process(key, data):
result = expensive_computation(data)
cache[key] = result # Never cleared!
return result
# Better: use WeakValueDictionary
import weakref
cache = weakref.WeakValueDictionary()When to Use gc
- Debugging memory leaks:
gc.get_objects(),gc.get_referrers() - Performance tuning: Adjust thresholds or disable temporarily
- Forcing cleanup:
gc.collect()before memory-sensitive operations - Long-running services: Monitor
gc.get_stats()
Summary
Python's GC handles circular references automatically. The gc module lets you:
- Force collection with
collect() - Debug with
get_objects(),get_referrers() - Tune with
set_threshold() - Monitor with
get_stats()
For most code, you don't need to touch it. But when hunting memory leaks, it's essential.
React to this post: