I once spent three hours debugging a "random" test failure. Tests passed individually but failed together. The culprit? A shared mutable object I thought I'd copied. I hadn't.
This post is everything I wish I'd known about Python's copy module before that debugging session.
The Bug That Started It All
Here's a simplified version of what bit me:
DEFAULT_CONFIG = {
"retries": 3,
"endpoints": ["api.example.com"]
}
def get_config():
return DEFAULT_CONFIG # Oops
# Test 1
config = get_config()
config["endpoints"].append("backup.example.com")
# Works fine!
# Test 2 (runs after Test 1)
config = get_config()
print(config["endpoints"])
# ['api.example.com', 'backup.example.com'] - Wait, what?I thought I was getting a fresh config each time. I wasn't. I was getting the same dictionary, and every test was mutating it.
Assignment Isn't Copying
This is Python 101, but it's worth repeating because it trips everyone up:
original = [1, 2, 3]
also_original = original
also_original.append(4)
print(original) # [1, 2, 3, 4] - both changed!also_original isn't a copy. It's another name for the same object. They're pointing to the same memory location.
print(original is also_original) # True
print(id(original) == id(also_original)) # TrueWhen you want actual copies, you need the copy module.
Shallow Copy: The Partial Solution
import copy
original = [1, 2, [3, 4]]
shallow = copy.copy(original)
# New list object
print(original is shallow) # False
# Top-level changes are independent
shallow[0] = 999
print(original) # [1, 2, [3, 4]] - unchanged, great!
# But nested objects are still shared
shallow[2].append(5)
print(original) # [1, 2, [3, 4, 5]] - oh nocopy.copy() creates a new container but puts the same objects inside it. It's like photocopying a folder—you get a new folder, but the papers inside are the originals.
Python has several ways to make shallow copies:
# Lists
new_list = old_list[:]
new_list = list(old_list)
new_list = old_list.copy()
# Dicts
new_dict = dict(old_dict)
new_dict = old_dict.copy()
new_dict = {**old_dict}
# All of these are shallow!Deep Copy: The Real Deal
import copy
original = [1, 2, [3, 4]]
deep = copy.deepcopy(original)
# Completely independent
deep[2].append(5)
print(original) # [1, 2, [3, 4]] - unchanged!
print(deep) # [1, 2, [3, 4, 5]]copy.deepcopy() recursively copies everything. New container, new nested containers, all the way down.
Here's how my config bug should have been fixed:
import copy
DEFAULT_CONFIG = {
"retries": 3,
"endpoints": ["api.example.com"]
}
def get_config():
return copy.deepcopy(DEFAULT_CONFIG) # Fixed!The Mutable Default Argument Trap
While we're talking about copy gotchas, this one bites everyone:
def add_item(item, items=[]): # BUG!
items.append(item)
return items
print(add_item("a")) # ['a']
print(add_item("b")) # ['a', 'b'] - wait, what?
print(add_item("c")) # ['a', 'b', 'c'] - this is badThe default [] is created once when the function is defined, not each time it's called. Every call shares the same list.
The fix:
def add_item(item, items=None):
if items is None:
items = []
items.append(item)
return itemsThis pattern is so common you'll see it everywhere. Burn it into your brain.
The same problem happens with class attributes:
class Team:
members = [] # Shared across ALL instances!
t1 = Team()
t2 = Team()
t1.members.append("Alice")
print(t2.members) # ['Alice'] - t2 sees Alice too!
# Fix: initialize in __init__
class Team:
def __init__(self):
self.members = [] # Each instance gets its ownCustom __copy__ and __deepcopy__
Sometimes you need control over how your objects are copied. Maybe you have a cache you don't want duplicated, or a connection you can't copy:
import copy
class DatabaseConnection:
def __init__(self, host, settings):
self.host = host
self.settings = settings
self._connection = None # Lazily opened
self._query_cache = {} # Don't copy this!
def __copy__(self):
# Shallow copy: same settings dict, fresh cache
new = DatabaseConnection.__new__(DatabaseConnection)
new.host = self.host
new.settings = self.settings # Shared reference
new._connection = None
new._query_cache = {}
return new
def __deepcopy__(self, memo):
# Deep copy: independent settings, fresh cache
new = DatabaseConnection.__new__(DatabaseConnection)
new.host = self.host
new.settings = copy.deepcopy(self.settings, memo)
new._connection = None
new._query_cache = {}
return newThe memo dictionary in __deepcopy__ is important—it tracks objects already copied to handle circular references. Always pass it to nested deepcopy() calls.
Circular References
Python handles circular references gracefully:
import copy
a = [1, 2]
a.append(a) # a now contains itself!
print(a) # [1, 2, [...]]
b = copy.deepcopy(a)
print(b[2] is b) # True - circular structure preserved
print(b is a) # False - it's a copyThe memo dictionary is how deepcopy tracks what's already been copied, preventing infinite loops.
When Copies Actually Matter
Not everything needs copying. Here's my mental checklist:
You need deepcopy when:
- You're modifying nested mutable structures
- You're storing state for later comparison (undo/redo)
- You're passing data to code you don't control
- Tests are failing mysteriously (check for shared state!)
Shallow copy is fine when:
- Your structure is flat (no nested lists/dicts)
- You're only reading, not modifying
- Nested objects are immutable (tuples, frozensets, strings)
No copy needed when:
- Everything is immutable
- You want changes to propagate (intentional sharing)
- You're iterating without modification
Performance: deepcopy Is Slow
Here's the uncomfortable truth:
import copy
import time
# Nested structure
data = {
"users": [{"name": "Alice", "tags": ["admin"]} for _ in range(1000)],
"metadata": {"created": "2024-01-01", "version": 1}
}
# Benchmark
start = time.perf_counter()
for _ in range(1000):
copy.copy(data)
print(f"Shallow: {time.perf_counter() - start:.3f}s")
start = time.perf_counter()
for _ in range(1000):
copy.deepcopy(data)
print(f"Deep: {time.perf_counter() - start:.3f}s")
# Typical output:
# Shallow: 0.002s
# Deep: 2.100sdeepcopy can be 100-1000x slower. For hot paths, consider:
- Restructure to avoid copying - Can you make things immutable?
- Copy only what changes - Partial deep copy
- Lazy copying (copy-on-write) - Only copy when modified
# Instead of deep copying a huge config every time:
def get_config_for_request(overrides):
# Start with shallow copy of top level
config = DEFAULT_CONFIG.copy()
# Only deep copy the parts we're modifying
if "endpoints" in overrides:
config["endpoints"] = copy.deepcopy(DEFAULT_CONFIG["endpoints"])
config["endpoints"].extend(overrides["endpoints"])
return configPractical Patterns
Pattern 1: Undo/Redo Stack
import copy
class Editor:
def __init__(self):
self.state = {"text": "", "cursor": 0}
self._undo_stack = []
self._redo_stack = []
def _save_state(self):
self._undo_stack.append(copy.deepcopy(self.state))
self._redo_stack.clear()
def type_text(self, text):
self._save_state()
self.state["text"] += text
self.state["cursor"] += len(text)
def undo(self):
if self._undo_stack:
self._redo_stack.append(copy.deepcopy(self.state))
self.state = self._undo_stack.pop()
def redo(self):
if self._redo_stack:
self._undo_stack.append(copy.deepcopy(self.state))
self.state = self._redo_stack.pop()Pattern 2: Safe Defaults
import copy
from dataclasses import dataclass, field
# For functions
def process_data(data, options=None):
options = options if options is not None else {}
# Now safe to modify options
# For dataclasses (field with default_factory)
@dataclass
class Request:
headers: dict = field(default_factory=dict)
params: list = field(default_factory=list)
# For class attributes that need copying
class APIClient:
DEFAULT_HEADERS = {"Accept": "application/json"}
def __init__(self):
self.headers = copy.copy(self.DEFAULT_HEADERS)Pattern 3: Snapshot Comparison
import copy
def has_state_changed(original, current):
"""Compare by making a snapshot."""
# Deep copy original so comparison is isolated
snapshot = copy.deepcopy(original)
return snapshot != current
# Useful for "save changes?" prompts
initial_doc = load_document()
working_doc = copy.deepcopy(initial_doc)
# ... user edits working_doc ...
if has_state_changed(initial_doc, working_doc):
prompt_save()Quick Reference
| What you want | What to use |
|---|---|
| Another name for same object | = assignment |
| New container, shared contents | copy.copy() or .copy() |
| Completely independent copy | copy.deepcopy() |
| Custom copy behavior | __copy__ / __deepcopy__ |
| Safe mutable default | None + check |
The Lesson
Every time I've had a "weird" bug with data appearing where it shouldn't, or tests mysteriously affecting each other, the answer has been shared mutable state.
Now I ask myself: "Am I copying this, or just pointing to it?" If there's any doubt, copy.deepcopy() is cheap insurance. Your future self (debugging at midnight) will thank you.
The three-hour debugging session that started this journey? I fixed it in two lines once I understood the problem:
import copy
def get_config():
return copy.deepcopy(DEFAULT_CONFIG)Learn from my pain.