When I first encountered UUIDs in a codebase, I saw uuid.uuid4() everywhere and didn't think much of it. It makes unique IDs. Simple, right? Then I discovered there are four different UUID functions, some leak your MAC address, and picking the wrong one can cause real problems. Here's everything I've learned.
What's a UUID?
A UUID (Universally Unique Identifier) is a 128-bit value that looks like this:
550e8400-e29b-41d4-a716-446655440000
The magic: you can generate one anywhere—your laptop, a server in Tokyo, a container in AWS—and it'll be unique. No central authority needed. No "get the next ID from the database" round trip.
The Four UUID Versions (and When to Use Each)
Python's uuid module gives you four functions. Here's what actually matters:
import uuid
# UUID v1: Time + MAC address
uuid.uuid1()
# UUID v3: MD5 hash of namespace + name
uuid.uuid3(uuid.NAMESPACE_DNS, 'example.com')
# UUID v4: Pure random
uuid.uuid4()
# UUID v5: SHA-1 hash of namespace + name
uuid.uuid5(uuid.NAMESPACE_DNS, 'example.com')uuid4: Your Default Choice
import uuid
user_id = uuid.uuid4()
print(user_id) # 7f8d4e2a-9b3c-4f1d-a8e7-2c5f9d3b1a4eThis generates a random UUID. No patterns, no information leakage, no coordination needed. Use this unless you have a specific reason not to.
I use uuid4() for:
- Database primary keys
- Session tokens
- API resource IDs
- Anything that needs to be unique
uuid5: When You Need Determinism
Sometimes you need the same UUID for the same input. That's uuid5:
import uuid
# Same input = same output, every time
id1 = uuid.uuid5(uuid.NAMESPACE_DNS, 'user@example.com')
id2 = uuid.uuid5(uuid.NAMESPACE_DNS, 'user@example.com')
print(id1 == id2) # True
# Different input = different output
id3 = uuid.uuid5(uuid.NAMESPACE_DNS, 'other@example.com')
print(id1 == id3) # FalseThis is useful for:
- Idempotent imports (import the same data twice, get the same ID)
- Generating child IDs from parent IDs
- Content-addressed storage
import uuid
# Create your own namespace for your app
MY_APP_NS = uuid.uuid5(uuid.NAMESPACE_DNS, 'myapp.example.com')
def user_id_from_email(email: str) -> uuid.UUID:
"""Deterministic user ID from email."""
return uuid.uuid5(MY_APP_NS, email)
# Always the same for the same email
print(user_id_from_email('alice@example.com'))uuid1: The Dangerous One
import uuid
u = uuid.uuid1()
print(hex(u.node)) # 0x1a2b3c4d5e6f ← Your MAC address!
print(u.time) # Timestamp of creationuuid1 leaks your machine's MAC address and creation time. This is a real privacy and security concern:
- Attackers can fingerprint your hardware
- They can see when records were created
- They can correlate activity across your system
I avoid uuid1 unless I'm in a closed internal system where:
- Sorting by creation time matters
- Privacy isn't a concern
- I need to trace when/where something was created
uuid3: Legacy, Avoid It
uuid3 is like uuid5 but uses MD5 instead of SHA-1. MD5 has known weaknesses. Use uuid5 instead. The only reason to use uuid3 is backwards compatibility with an existing system.
Creating UUIDs from Strings
APIs send you UUID strings. Users paste them into forms. Here's how to parse them:
import uuid
# From string (with dashes)
u = uuid.UUID('550e8400-e29b-41d4-a716-446655440000')
# From hex string (no dashes)
u = uuid.UUID('550e8400e29b41d4a716446655440000')
# From bytes
u = uuid.UUID(bytes=b'\x55\x0e\x84\x00...')
# From integer
u = uuid.UUID(int=113059749145936325402354257176981405696)Validation Pattern
User input needs validation:
import uuid
def parse_uuid(value: str) -> uuid.UUID | None:
"""Parse UUID string, return None if invalid."""
try:
return uuid.UUID(value)
except (ValueError, AttributeError):
return None
# In an API handler
user_id = parse_uuid(request.params.get('user_id'))
if user_id is None:
return {"error": "Invalid user ID"}UUID Attributes: hex, bytes, int
UUIDs aren't just strings. They're objects with useful representations:
import uuid
u = uuid.uuid4()
# Standard string (with dashes)
str(u) # '550e8400-e29b-41d4-a716-446655440000'
# Hex string (no dashes, 32 characters)
u.hex # '550e8400e29b41d4a716446655440000'
# Raw bytes (16 bytes)
u.bytes # b'U\x0e\x84\x00\xe2\x9bA\xd4\xa7\x16DfUD\x00\x00'
# Integer (128-bit)
u.int # 113059749145936325402354257176981405696
# URN format
u.urn # 'urn:uuid:550e8400-e29b-41d4-a716-446655440000'
# Version and variant
u.version # 4
u.variant # 'specified in RFC 4122'When to use each:
- hex: When you need a string without dashes (shorter URLs, file names)
- bytes: Binary protocols, efficient storage, cryptographic operations
- int: Mathematical operations, bit manipulation
- str(): Human-readable displays, JSON APIs
import uuid
u = uuid.uuid4()
# Compact storage in a binary file
with open('ids.bin', 'wb') as f:
f.write(u.bytes) # 16 bytes instead of 36 characters
# Shorter URL paths
print(f"/users/{u.hex}") # No dashesUUIDs in Databases
This is where I made my first real mistake. I stored UUIDs as strings. Don't do that.
The Right Way: Native UUID Columns
# SQLAlchemy with PostgreSQL
from sqlalchemy import Column
from sqlalchemy.dialects.postgresql import UUID
import uuid
class User(Base):
__tablename__ = 'users'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
name = Column(String)PostgreSQL has a native UUID type. It stores 16 bytes, indexes efficiently, and validates format automatically.
The Wrong Way: VARCHAR
-- Don't do this
CREATE TABLE users (
id VARCHAR(36) PRIMARY KEY, -- Wastes space, slow indexes
name VARCHAR(255)
);VARCHAR(36) wastes space (36 bytes + overhead vs 16 bytes) and makes comparisons slower.
SQLite: Store as BLOB or Text
SQLite doesn't have a UUID type:
import sqlite3
import uuid
conn = sqlite3.connect(':memory:')
# Option 1: Store as 16-byte BLOB (efficient)
conn.execute('CREATE TABLE users (id BLOB PRIMARY KEY, name TEXT)')
user_id = uuid.uuid4()
conn.execute('INSERT INTO users VALUES (?, ?)', (user_id.bytes, 'Alice'))
# Option 2: Store as text (easier to debug)
conn.execute('CREATE TABLE posts (id TEXT PRIMARY KEY, title TEXT)')
post_id = uuid.uuid4()
conn.execute('INSERT INTO posts VALUES (?, ?)', (str(post_id), 'Hello'))UUIDs in APIs
JSON APIs typically use UUID strings:
import uuid
import json
# Serializing
user = {
'id': str(uuid.uuid4()),
'name': 'Alice'
}
json.dumps(user) # Works
# Won't work - UUID isn't JSON serializable by default
user = {
'id': uuid.uuid4(), # UUID object
'name': 'Alice'
}
json.dumps(user) # TypeError!Custom JSON Encoder
import uuid
import json
class UUIDEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, uuid.UUID):
return str(obj)
return super().default(obj)
user = {'id': uuid.uuid4(), 'name': 'Alice'}
json.dumps(user, cls=UUIDEncoder) # WorksPydantic (My Preferred Approach)
from pydantic import BaseModel
import uuid
class User(BaseModel):
id: uuid.UUID
name: str
user = User(id=uuid.uuid4(), name='Alice')
print(user.model_dump_json())
# {"id":"550e8400-e29b-41d4-a716-446655440000","name":"Alice"}
# Parsing incoming JSON
data = '{"id": "550e8400-e29b-41d4-a716-446655440000", "name": "Bob"}'
user = User.model_validate_json(data)
print(type(user.id)) # <class 'uuid.UUID'>Security Considerations
uuid1 Leaks Your MAC Address
This bears repeating. Look at what uuid1 reveals:
import uuid
u = uuid.uuid1()
# When it was created (100-nanosecond precision since 1582)
print(f"Time: {u.time}")
# Your machine's MAC address
print(f"Node (MAC): {u.node:012x}") # e.g., "1a2b3c4d5e6f"In a security incident, this is valuable intel:
- Attackers can correlate UUIDs across your system
- They know your hardware fingerprint
- They can estimate when you created records
Always use uuid4 for user-facing or external IDs.
UUIDs Aren't Secrets
A UUID is unique, not secret. Anyone who sees it can use it:
# BAD: Password reset via UUID in URL
reset_url = f"https://example.com/reset/{uuid.uuid4()}"
# If someone intercepts this, they can reset the password
# BETTER: Use secrets module for security-sensitive tokens
import secrets
reset_token = secrets.token_urlsafe(32)Use UUIDs for identity. Use the secrets module for security tokens.
UUID Collision Probability
A common concern: "What if two UUIDs collide?"
With uuid4, you'd need to generate about 2.71×10^18 UUIDs to have a 50% chance of collision. That's billions of IDs per second for 85 years.
In practice: don't worry about it. The sun will burn out first.
Common Patterns
Pattern 1: Auto-generated IDs with Dataclasses
from dataclasses import dataclass, field
import uuid
@dataclass
class User:
name: str
email: str
id: uuid.UUID = field(default_factory=uuid.uuid4)
alice = User(name="Alice", email="alice@example.com")
print(alice.id) # Auto-generated UUIDPattern 2: Derived IDs
Generate child IDs deterministically from parent IDs:
import uuid
def child_id(parent: uuid.UUID, suffix: str) -> uuid.UUID:
"""Generate deterministic child ID."""
return uuid.uuid5(parent, suffix)
user_id = uuid.uuid4()
profile_id = child_id(user_id, 'profile')
settings_id = child_id(user_id, 'settings')
# Same user always gets same profile/settings IDsPattern 3: Short URL-Safe IDs
UUIDs are long (36 chars). For URLs, you can shorten them:
import uuid
import base64
def uuid_to_short(u: uuid.UUID) -> str:
"""Convert UUID to 22-char URL-safe string."""
return base64.urlsafe_b64encode(u.bytes).rstrip(b'=').decode()
def short_to_uuid(s: str) -> uuid.UUID:
"""Convert back to UUID."""
padding = '=' * (4 - len(s) % 4)
return uuid.UUID(bytes=base64.urlsafe_b64decode(s + padding))
u = uuid.uuid4()
short = uuid_to_short(u)
print(short) # 'VQ6EAOKbQdSnFkRmVUQAAA' (22 chars)
print(short_to_uuid(short) == u) # TruePattern 4: Request Tracing
Add UUIDs to every request for debugging:
import uuid
import logging
def handle_request(request):
request_id = str(uuid.uuid4())
# Add to all log messages
logger = logging.getLogger(__name__)
logger.info(f"[{request_id}] Processing request")
# Add to response headers
response.headers['X-Request-ID'] = request_id
return responseQuick Reference
import uuid
# Generate (choose one)
uuid.uuid4() # Random - default choice
uuid.uuid5(namespace, name) # Deterministic from input
uuid.uuid1() # Time + MAC (privacy risk!)
# Parse
uuid.UUID('550e8400-e29b-41d4-...') # From string
uuid.UUID(hex='550e8400...') # From hex
uuid.UUID(bytes=b'...') # From bytes
# Convert
str(u) # Standard string with dashes
u.hex # No dashes
u.bytes # 16 bytes
u.int # Integer
# Properties
u.version # 1, 3, 4, or 5
u.variant # Usually RFC_4122What I Wish I Knew Earlier
- Start with uuid4. It's the safe default.
- Store natively. PostgreSQL's UUID type, not VARCHAR.
- uuid1 leaks info. Avoid it unless you understand the tradeoff.
- UUIDs aren't secrets. Use
secretsmodule for tokens. - Collisions won't happen. Seriously, don't worry about it.
The uuid module is one of those stdlib gems that just works. Pick the right version, use native database types, and you'll rarely think about it again.