I'll be honest: heaps scared me. Whenever I saw "implement a priority queue" or "find the top K elements," I'd freeze. The word "heap" conjured images of memory management and low-level computer science stuff I didn't fully understand.
Then I actually sat down with Python's heapq module. Turns out? It's just a list with superpowers.
What's a Heap, Really?
Before we touch the code, let me explain heaps the way I wish someone had explained them to me.
A heap is a list where the smallest element is always at the front. That's it. That's the magic.
heap = [1, 3, 5, 7, 9, 8, 6]
# ^
# Smallest is always at index 0The rest of the list follows a pattern called the "heap invariant"—each parent is smaller than its children. But honestly? You don't need to understand the tree structure to use heaps effectively. Python handles all that internally.
What you need to know:
- Getting the minimum takes O(1) — just peek at index 0
- Adding an item takes O(log n) — it bubbles up to the right spot
- Removing the minimum takes O(log n) — it rebalances automatically
Compare that to a sorted list where insertion is O(n). For anything involving "give me the smallest thing" repeatedly, heaps win.
The Aha Moment: heappush and heappop
Let me show you the basics that made everything click:
import heapq
# Start with an empty list (that's your heap!)
tasks = []
# Add items with heappush
heapq.heappush(tasks, 5)
heapq.heappush(tasks, 2)
heapq.heappush(tasks, 8)
heapq.heappush(tasks, 1)
print(tasks) # [1, 2, 8, 5] — looks weird, but 1 is at index 0
# Get the smallest with heappop
smallest = heapq.heappop(tasks)
print(smallest) # 1
smallest = heapq.heappop(tasks)
print(smallest) # 2
# The heap rebalances each time
print(tasks) # [5, 8]The key insight: heappop doesn't just remove index 0. It removes the smallest element and reorganizes the heap so the next smallest is now at index 0. That reorganization is O(log n), which is why heaps are fast.
The "Just Peek" Trick
Want to see the minimum without removing it?
heap = [3, 5, 7]
smallest = heap[0] # Just look at index 0!No special function needed. The smallest is always there.
heapify: Converting an Existing List
Sometimes you already have data and want to turn it into a heap. Don't push items one by one—use heapify:
import heapq
scores = [45, 23, 89, 12, 67, 34]
heapq.heapify(scores) # In-place transformation!
print(scores) # [12, 23, 34, 45, 67, 89]
# Now scores behaves like a heap
print(scores[0]) # 12 — the minimumHere's the beautiful part: heapify is O(n), not O(n log n). It's mathematically cheaper than pushing each element individually. I was skeptical about this until I read about the "siftdown" algorithm—trust me, it works.
When I use heapify:
- Loading data from a file/database and need priority ordering
- Converting results from another operation
- Starting a problem with an initial dataset
When I use heappush loop:
- Building the heap dynamically as items arrive
- Streaming data scenarios
nlargest and nsmallest: The Shortcuts I Wish I'd Known Sooner
For weeks, I wrote code like this when I needed the top 3 items:
# The hard way (what I used to do)
sorted_list = sorted(items, reverse=True)
top_3 = sorted_list[:3]Then someone showed me this:
import heapq
scores = [45, 23, 89, 12, 67, 34, 78, 56]
# Top 3 scores
top_3 = heapq.nlargest(3, scores)
print(top_3) # [89, 78, 67]
# Bottom 3 scores
bottom_3 = heapq.nsmallest(3, scores)
print(bottom_3) # [12, 23, 34]These functions are smarter than just sorting. When K is small relative to N, they use a heap internally and avoid sorting the entire list. O(n log k) vs O(n log n).
The key Parameter Changes Everything
Just like sorted(), you can use a key function:
import heapq
employees = [
{"name": "Alice", "salary": 75000},
{"name": "Bob", "salary": 55000},
{"name": "Carol", "salary": 90000},
{"name": "Dave", "salary": 65000},
]
# Highest paid employees
top_earners = heapq.nlargest(2, employees, key=lambda e: e["salary"])
print(top_earners)
# [{'name': 'Carol', 'salary': 90000}, {'name': 'Alice', 'salary': 75000}]
# Lowest paid
lowest = heapq.nsmallest(1, employees, key=lambda e: e["salary"])
print(lowest[0]["name"]) # BobWhen to Use What
This took me a while to internalize:
| Scenario | Best Tool |
|---|---|
| Just need min/max | min() / max() |
| Need top K, K is small | nsmallest / nlargest |
| Need top K, K ≈ N | sorted() |
| Repeated min extraction | heap with heappop |
Understanding the Heap Invariant (Finally)
Okay, let's demystify the "heap invariant." It's not as scary as it sounds.
A min-heap has one rule: every parent must be smaller than or equal to its children.
The heap is stored as a flat list, but conceptually it's a binary tree:
1
/ \
3 5
/ \
7 9
Stored as: [1, 3, 5, 7, 9]
The parent-child relationships use simple index math:
- Parent of index
iis at(i - 1) // 2 - Children of index
iare at2*i + 1and2*i + 2
But here's the thing: you never need to think about this. Python's heapq handles all the bookkeeping. I'm only explaining it because understanding why heaps work helped me trust them.
The invariant explains why the list might look "unsorted":
import heapq
data = [5, 3, 8, 1, 9, 2]
heapq.heapify(data)
print(data) # [1, 3, 2, 5, 9, 8]That list isn't sorted! But it does satisfy the heap property. 1 is less than its children (3 and 2), 3 is less than its children (5 and 9), and so on. The minimum bubbles to the top, but the rest is only partially ordered.
Python Only Has Min-Heaps (And How to Work Around It)
This tripped me up. I needed a max-heap to always get the largest element. But Python's heapq only does min-heaps.
The solution: negate your values.
import heapq
# Max-heap by negating
max_heap = []
heapq.heappush(max_heap, -5)
heapq.heappush(max_heap, -3)
heapq.heappush(max_heap, -8)
# "Pop minimum" gives us the most negative (i.e., largest original)
largest = -heapq.heappop(max_heap)
print(largest) # 8It feels hacky, but it's the standard pattern. If you're doing this a lot, wrap it in a class:
import heapq
class MaxHeap:
def __init__(self):
self._heap = []
def push(self, value):
heapq.heappush(self._heap, -value)
def pop(self):
return -heapq.heappop(self._heap)
def peek(self):
return -self._heap[0] if self._heap else None
def __len__(self):
return len(self._heap)Building a Real Priority Queue
Here's where heaps become practical. A priority queue processes items by priority, not arrival order.
The simplest version:
import heapq
class PriorityQueue:
def __init__(self):
self._heap = []
def push(self, item, priority):
# Tuple comparison: compares priority first, then item
heapq.heappush(self._heap, (priority, item))
def pop(self):
priority, item = heapq.heappop(self._heap)
return item
def __bool__(self):
return bool(self._heap)
# Usage
pq = PriorityQueue()
pq.push("fix critical bug", priority=1)
pq.push("update docs", priority=5)
pq.push("add feature", priority=3)
while pq:
print(pq.pop())
# fix critical bug
# add feature
# update docsThe Tie-Breaking Problem
There's a gotcha. If two items have the same priority, Python tries to compare the items themselves:
pq.push({"task": "a"}, priority=1)
pq.push({"task": "b"}, priority=1) # Error! Can't compare dictsThe fix: add a tiebreaker. I use a counter:
import heapq
from dataclasses import dataclass, field
from typing import Any
@dataclass(order=True)
class PrioritizedItem:
priority: int
counter: int
item: Any = field(compare=False)
class PriorityQueue:
def __init__(self):
self._heap = []
self._counter = 0
def push(self, item, priority):
entry = PrioritizedItem(priority, self._counter, item)
heapq.heappush(self._heap, entry)
self._counter += 1
def pop(self):
return heapq.heappop(self._heap).itemThe counter ensures FIFO order for equal priorities, and field(compare=False) excludes the item from comparisons.
Common Patterns That Finally Made Sense
Pattern 1: Top-K Problems
"Find the K largest elements" is everywhere. Interviews, data processing, recommendations.
import heapq
def top_k_scores(scores: list[int], k: int) -> list[int]:
"""Return the K highest scores."""
return heapq.nlargest(k, scores)
# Or with more complex data
def top_k_users_by_points(users: list[dict], k: int) -> list[dict]:
return heapq.nlargest(k, users, key=lambda u: u["points"])But sometimes you're streaming data and can't hold everything in memory. Use a bounded min-heap:
import heapq
def streaming_top_k(stream, k):
"""Track top K items from a stream, memory efficient."""
heap = []
for item in stream:
if len(heap) < k:
heapq.heappush(heap, item)
elif item > heap[0]: # Bigger than smallest in our top-K
heapq.heapreplace(heap, item) # Pop smallest, push new
return sorted(heap, reverse=True)This only keeps K items in memory, no matter how large the stream.
Pattern 2: Merging Sorted Streams
You have multiple sorted lists and need them merged into one sorted stream. This comes up with log files, database results, or external sorting.
import heapq
log_files = [
["2026-03-22 10:01 INFO Started"],
["2026-03-22 10:00 DEBUG Init", "2026-03-22 10:02 ERROR Failed"],
["2026-03-22 10:01 WARN Slow query"],
]
# Merge maintaining chronological order
for line in heapq.merge(*log_files):
print(line)Output is perfectly sorted. And merge is lazy—it doesn't load everything into memory at once.
Pattern 3: K-Way Merge
The merge pattern generalizes to any number of sorted iterables:
import heapq
def merge_sorted_files(file_paths):
"""Merge multiple sorted files into one sorted output."""
files = [open(path) for path in file_paths]
try:
# heapq.merge handles everything
for line in heapq.merge(*files, key=str.lower):
yield line.strip()
finally:
for f in files:
f.close()Pattern 4: Task Scheduling
Schedule tasks to run at specific times:
import heapq
from datetime import datetime, timedelta
class Scheduler:
def __init__(self):
self._tasks = []
self._counter = 0
def schedule(self, delay_seconds: float, task_name: str):
run_at = datetime.now() + timedelta(seconds=delay_seconds)
heapq.heappush(self._tasks, (run_at, self._counter, task_name))
self._counter += 1
def next_task(self):
if self._tasks:
run_at, _, name = heapq.heappop(self._tasks)
return run_at, name
return None
def peek_next(self):
if self._tasks:
return self._tasks[0][0], self._tasks[0][2]
return None
scheduler = Scheduler()
scheduler.schedule(10, "send_email")
scheduler.schedule(5, "process_payment")
scheduler.schedule(15, "generate_report")
# Tasks come out in time order
print(scheduler.next_task()) # (~5s from now, "process_payment")
print(scheduler.next_task()) # (~10s from now, "send_email")Pattern 5: Running Median
This one blew my mind—using TWO heaps to track a running median:
import heapq
class RunningMedian:
"""Track median of a stream using two heaps."""
def __init__(self):
self.small = [] # Max-heap (negated) for smaller half
self.large = [] # Min-heap for larger half
def add(self, num):
# Always add to small first
heapq.heappush(self.small, -num)
# Move largest from small to large
heapq.heappush(self.large, -heapq.heappop(self.small))
# Keep small >= large in size
if len(self.large) > len(self.small):
heapq.heappush(self.small, -heapq.heappop(self.large))
def median(self):
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
rm = RunningMedian()
for num in [2, 1, 5, 7, 2, 0, 5]:
rm.add(num)
print(f"Added {num}, median now: {rm.median()}")The insight: small heap holds the smaller half of numbers (as a max-heap), large heap holds the larger half. Median is either the top of small (if odd count) or average of both tops.
The Efficiency Shortcuts: heappushpop and heapreplace
Two functions I overlooked at first but now use all the time:
import heapq
heap = [1, 3, 5, 7]
# heappushpop: push then pop (faster than separate calls)
result = heapq.heappushpop(heap, 4)
print(result) # 1 (the old minimum)
print(heap) # [3, 4, 5, 7]
# heapreplace: pop then push (faster than separate calls)
result = heapq.heapreplace(heap, 2)
print(result) # 3 (the old minimum)
print(heap) # [2, 4, 5, 7]The difference is subtle:
heappushpop(heap, item): Pushes item, then pops. If the new item is smallest, you get it right back.heapreplace(heap, item): Pops first, then pushes. The popped item might be smaller than what you're pushing.
I use heapreplace in the streaming top-K pattern—replace the smallest with a new candidate.
Quick Reference
import heapq
# Create heap from list
heapq.heapify(x) # In-place, O(n)
# Add item
heapq.heappush(heap, item) # O(log n)
# Remove and return smallest
heapq.heappop(heap) # O(log n)
# Peek at smallest
heap[0] # O(1)
# Push then pop (efficient combo)
heapq.heappushpop(heap, item) # O(log n)
# Pop then push (efficient combo)
heapq.heapreplace(heap, item) # O(log n)
# Find K smallest/largest
heapq.nsmallest(k, iterable, key=None) # O(n log k)
heapq.nlargest(k, iterable, key=None) # O(n log k)
# Merge sorted iterables
heapq.merge(*iterables, key=None, reverse=False) # O(n) totalWhat I Wish I'd Known Earlier
-
Heaps are just lists. There's no special heap type. Any list can be a heap if you only modify it with
heapqfunctions. -
The "heap" is the invariant, not the data structure. If you append to a heap without using
heappush, you break it. -
nlargest/nsmallest exist. Don't sort when you only need a few elements.
-
Negation is the standard max-heap trick. It's not hacky, it's the pattern.
-
heapify is O(n). Don't push elements one by one when you have existing data.
-
You don't need to understand the tree structure to use heaps. Just know: smallest is always at index 0, and the
heapqfunctions maintain that.
Heaps stopped being scary once I realized they're just a clever way to always keep the minimum accessible. Now I reach for them any time I see "priority," "top K," or "schedule."
The heapq module is in the standard library. Nothing to install. It's been there all along, waiting for you to stop being afraid of it.