I once spent three hours debugging a "random" test failure. Tests passed individually but failed together. The culprit? A shared mutable object I thought I'd copied. I hadn't.

This post is everything I wish I'd known about Python's copy module before that debugging session.

The Bug That Started It All

Here's a simplified version of what bit me:

DEFAULT_CONFIG = {
    "retries": 3,
    "endpoints": ["api.example.com"]
}
 
def get_config():
    return DEFAULT_CONFIG  # Oops
 
# Test 1
config = get_config()
config["endpoints"].append("backup.example.com")
# Works fine!
 
# Test 2 (runs after Test 1)
config = get_config()
print(config["endpoints"])  
# ['api.example.com', 'backup.example.com'] - Wait, what?

I thought I was getting a fresh config each time. I wasn't. I was getting the same dictionary, and every test was mutating it.

Assignment Isn't Copying

This is Python 101, but it's worth repeating because it trips everyone up:

original = [1, 2, 3]
also_original = original
 
also_original.append(4)
print(original)  # [1, 2, 3, 4] - both changed!

also_original isn't a copy. It's another name for the same object. They're pointing to the same memory location.

print(original is also_original)  # True
print(id(original) == id(also_original))  # True

When you want actual copies, you need the copy module.

Shallow Copy: The Partial Solution

import copy
 
original = [1, 2, [3, 4]]
shallow = copy.copy(original)
 
# New list object
print(original is shallow)  # False
 
# Top-level changes are independent
shallow[0] = 999
print(original)  # [1, 2, [3, 4]] - unchanged, great!
 
# But nested objects are still shared
shallow[2].append(5)
print(original)  # [1, 2, [3, 4, 5]] - oh no

copy.copy() creates a new container but puts the same objects inside it. It's like photocopying a folder—you get a new folder, but the papers inside are the originals.

Python has several ways to make shallow copies:

# Lists
new_list = old_list[:]
new_list = list(old_list)
new_list = old_list.copy()
 
# Dicts
new_dict = dict(old_dict)
new_dict = old_dict.copy()
new_dict = {**old_dict}
 
# All of these are shallow!

Deep Copy: The Real Deal

import copy
 
original = [1, 2, [3, 4]]
deep = copy.deepcopy(original)
 
# Completely independent
deep[2].append(5)
print(original)  # [1, 2, [3, 4]] - unchanged!
print(deep)      # [1, 2, [3, 4, 5]]

copy.deepcopy() recursively copies everything. New container, new nested containers, all the way down.

Here's how my config bug should have been fixed:

import copy
 
DEFAULT_CONFIG = {
    "retries": 3,
    "endpoints": ["api.example.com"]
}
 
def get_config():
    return copy.deepcopy(DEFAULT_CONFIG)  # Fixed!

The Mutable Default Argument Trap

While we're talking about copy gotchas, this one bites everyone:

def add_item(item, items=[]):  # BUG!
    items.append(item)
    return items
 
print(add_item("a"))  # ['a']
print(add_item("b"))  # ['a', 'b'] - wait, what?
print(add_item("c"))  # ['a', 'b', 'c'] - this is bad

The default [] is created once when the function is defined, not each time it's called. Every call shares the same list.

The fix:

def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

This pattern is so common you'll see it everywhere. Burn it into your brain.

The same problem happens with class attributes:

class Team:
    members = []  # Shared across ALL instances!
 
t1 = Team()
t2 = Team()
t1.members.append("Alice")
print(t2.members)  # ['Alice'] - t2 sees Alice too!
 
# Fix: initialize in __init__
class Team:
    def __init__(self):
        self.members = []  # Each instance gets its own

Custom __copy__ and __deepcopy__

Sometimes you need control over how your objects are copied. Maybe you have a cache you don't want duplicated, or a connection you can't copy:

import copy
 
class DatabaseConnection:
    def __init__(self, host, settings):
        self.host = host
        self.settings = settings
        self._connection = None  # Lazily opened
        self._query_cache = {}   # Don't copy this!
    
    def __copy__(self):
        # Shallow copy: same settings dict, fresh cache
        new = DatabaseConnection.__new__(DatabaseConnection)
        new.host = self.host
        new.settings = self.settings  # Shared reference
        new._connection = None
        new._query_cache = {}
        return new
    
    def __deepcopy__(self, memo):
        # Deep copy: independent settings, fresh cache
        new = DatabaseConnection.__new__(DatabaseConnection)
        new.host = self.host
        new.settings = copy.deepcopy(self.settings, memo)
        new._connection = None
        new._query_cache = {}
        return new

The memo dictionary in __deepcopy__ is important—it tracks objects already copied to handle circular references. Always pass it to nested deepcopy() calls.

Circular References

Python handles circular references gracefully:

import copy
 
a = [1, 2]
a.append(a)  # a now contains itself!
print(a)  # [1, 2, [...]]
 
b = copy.deepcopy(a)
print(b[2] is b)  # True - circular structure preserved
print(b is a)     # False - it's a copy

The memo dictionary is how deepcopy tracks what's already been copied, preventing infinite loops.

When Copies Actually Matter

Not everything needs copying. Here's my mental checklist:

You need deepcopy when:

  • You're modifying nested mutable structures
  • You're storing state for later comparison (undo/redo)
  • You're passing data to code you don't control
  • Tests are failing mysteriously (check for shared state!)

Shallow copy is fine when:

  • Your structure is flat (no nested lists/dicts)
  • You're only reading, not modifying
  • Nested objects are immutable (tuples, frozensets, strings)

No copy needed when:

  • Everything is immutable
  • You want changes to propagate (intentional sharing)
  • You're iterating without modification

Performance: deepcopy Is Slow

Here's the uncomfortable truth:

import copy
import time
 
# Nested structure
data = {
    "users": [{"name": "Alice", "tags": ["admin"]} for _ in range(1000)],
    "metadata": {"created": "2024-01-01", "version": 1}
}
 
# Benchmark
start = time.perf_counter()
for _ in range(1000):
    copy.copy(data)
print(f"Shallow: {time.perf_counter() - start:.3f}s")
 
start = time.perf_counter()
for _ in range(1000):
    copy.deepcopy(data)
print(f"Deep: {time.perf_counter() - start:.3f}s")
 
# Typical output:
# Shallow: 0.002s
# Deep: 2.100s

deepcopy can be 100-1000x slower. For hot paths, consider:

  1. Restructure to avoid copying - Can you make things immutable?
  2. Copy only what changes - Partial deep copy
  3. Lazy copying (copy-on-write) - Only copy when modified
# Instead of deep copying a huge config every time:
def get_config_for_request(overrides):
    # Start with shallow copy of top level
    config = DEFAULT_CONFIG.copy()
    
    # Only deep copy the parts we're modifying
    if "endpoints" in overrides:
        config["endpoints"] = copy.deepcopy(DEFAULT_CONFIG["endpoints"])
        config["endpoints"].extend(overrides["endpoints"])
    
    return config

Practical Patterns

Pattern 1: Undo/Redo Stack

import copy
 
class Editor:
    def __init__(self):
        self.state = {"text": "", "cursor": 0}
        self._undo_stack = []
        self._redo_stack = []
    
    def _save_state(self):
        self._undo_stack.append(copy.deepcopy(self.state))
        self._redo_stack.clear()
    
    def type_text(self, text):
        self._save_state()
        self.state["text"] += text
        self.state["cursor"] += len(text)
    
    def undo(self):
        if self._undo_stack:
            self._redo_stack.append(copy.deepcopy(self.state))
            self.state = self._undo_stack.pop()
    
    def redo(self):
        if self._redo_stack:
            self._undo_stack.append(copy.deepcopy(self.state))
            self.state = self._redo_stack.pop()

Pattern 2: Safe Defaults

import copy
from dataclasses import dataclass, field
 
# For functions
def process_data(data, options=None):
    options = options if options is not None else {}
    # Now safe to modify options
    
# For dataclasses (field with default_factory)
@dataclass
class Request:
    headers: dict = field(default_factory=dict)
    params: list = field(default_factory=list)
 
# For class attributes that need copying
class APIClient:
    DEFAULT_HEADERS = {"Accept": "application/json"}
    
    def __init__(self):
        self.headers = copy.copy(self.DEFAULT_HEADERS)

Pattern 3: Snapshot Comparison

import copy
 
def has_state_changed(original, current):
    """Compare by making a snapshot."""
    # Deep copy original so comparison is isolated
    snapshot = copy.deepcopy(original)
    return snapshot != current
 
# Useful for "save changes?" prompts
initial_doc = load_document()
working_doc = copy.deepcopy(initial_doc)
 
# ... user edits working_doc ...
 
if has_state_changed(initial_doc, working_doc):
    prompt_save()

Quick Reference

What you wantWhat to use
Another name for same object= assignment
New container, shared contentscopy.copy() or .copy()
Completely independent copycopy.deepcopy()
Custom copy behavior__copy__ / __deepcopy__
Safe mutable defaultNone + check

The Lesson

Every time I've had a "weird" bug with data appearing where it shouldn't, or tests mysteriously affecting each other, the answer has been shared mutable state.

Now I ask myself: "Am I copying this, or just pointing to it?" If there's any doubt, copy.deepcopy() is cheap insurance. Your future self (debugging at midnight) will thank you.

The three-hour debugging session that started this journey? I fixed it in two lines once I understood the problem:

import copy
 
def get_config():
    return copy.deepcopy(DEFAULT_CONFIG)

Learn from my pain.

React to this post: