Understanding Python UUIDs: A Junior Engineer's Guide

When I first encountered UUIDs in a codebase, I saw uuid.uuid4() everywhere and didn't think much of it. It makes unique IDs. Simple, right? Then I discovered there are four different UUID functions, some leak your MAC address, and picking the wrong one can cause real problems. Here's everything I've learned.

What's a UUID?

A UUID (Universally Unique Identifier) is a 128-bit value that looks like this:

550e8400-e29b-41d4-a716-446655440000

The magic: you can generate one anywhere—your laptop, a server in Tokyo, a container in AWS—and it'll be unique. No central authority needed. No "get the next ID from the database" round trip.

The Four UUID Versions (and When to Use Each)

Python's uuid module gives you four functions. Here's what actually matters:

import uuid
 
# UUID v1: Time + MAC address
uuid.uuid1()
 
# UUID v3: MD5 hash of namespace + name
uuid.uuid3(uuid.NAMESPACE_DNS, 'example.com')
 
# UUID v4: Pure random
uuid.uuid4()
 
# UUID v5: SHA-1 hash of namespace + name
uuid.uuid5(uuid.NAMESPACE_DNS, 'example.com')

uuid4: Your Default Choice

import uuid
 
user_id = uuid.uuid4()
print(user_id)  # 7f8d4e2a-9b3c-4f1d-a8e7-2c5f9d3b1a4e

This generates a random UUID. No patterns, no information leakage, no coordination needed. Use this unless you have a specific reason not to.

I use uuid4() for:

Database primary keys
Session tokens
API resource IDs
Anything that needs to be unique

uuid5: When You Need Determinism

Sometimes you need the same UUID for the same input. That's uuid5:

import uuid
 
# Same input = same output, every time
id1 = uuid.uuid5(uuid.NAMESPACE_DNS, 'user@example.com')
id2 = uuid.uuid5(uuid.NAMESPACE_DNS, 'user@example.com')
 
print(id1 == id2)  # True
 
# Different input = different output
id3 = uuid.uuid5(uuid.NAMESPACE_DNS, 'other@example.com')
print(id1 == id3)  # False

This is useful for:

Idempotent imports (import the same data twice, get the same ID)
Generating child IDs from parent IDs
Content-addressed storage

import uuid
 
# Create your own namespace for your app
MY_APP_NS = uuid.uuid5(uuid.NAMESPACE_DNS, 'myapp.example.com')
 
def user_id_from_email(email: str) -> uuid.UUID:
    """Deterministic user ID from email."""
    return uuid.uuid5(MY_APP_NS, email)
 
# Always the same for the same email
print(user_id_from_email('alice@example.com'))

uuid1: The Dangerous One

import uuid
 
u = uuid.uuid1()
print(hex(u.node))  # 0x1a2b3c4d5e6f  ← Your MAC address!
print(u.time)       # Timestamp of creation

uuid1 leaks your machine's MAC address and creation time. This is a real privacy and security concern:

Attackers can fingerprint your hardware
They can see when records were created
They can correlate activity across your system

I avoid uuid1 unless I'm in a closed internal system where:

Sorting by creation time matters
Privacy isn't a concern
I need to trace when/where something was created

uuid3: Legacy, Avoid It

uuid3 is like uuid5 but uses MD5 instead of SHA-1. MD5 has known weaknesses. Use uuid5 instead. The only reason to use uuid3 is backwards compatibility with an existing system.

Creating UUIDs from Strings

APIs send you UUID strings. Users paste them into forms. Here's how to parse them:

import uuid
 
# From string (with dashes)
u = uuid.UUID('550e8400-e29b-41d4-a716-446655440000')
 
# From hex string (no dashes)
u = uuid.UUID('550e8400e29b41d4a716446655440000')
 
# From bytes
u = uuid.UUID(bytes=b'\x55\x0e\x84\x00...')
 
# From integer
u = uuid.UUID(int=113059749145936325402354257176981405696)

Validation Pattern

User input needs validation:

import uuid
 
def parse_uuid(value: str) -> uuid.UUID | None:
    """Parse UUID string, return None if invalid."""
    try:
        return uuid.UUID(value)
    except (ValueError, AttributeError):
        return None
 
# In an API handler
user_id = parse_uuid(request.params.get('user_id'))
if user_id is None:
    return {"error": "Invalid user ID"}

UUID Attributes: hex, bytes, int

UUIDs aren't just strings. They're objects with useful representations:

import uuid
 
u = uuid.uuid4()
 
# Standard string (with dashes)
str(u)          # '550e8400-e29b-41d4-a716-446655440000'
 
# Hex string (no dashes, 32 characters)
u.hex           # '550e8400e29b41d4a716446655440000'
 
# Raw bytes (16 bytes)
u.bytes         # b'U\x0e\x84\x00\xe2\x9bA\xd4\xa7\x16DfUD\x00\x00'
 
# Integer (128-bit)
u.int           # 113059749145936325402354257176981405696
 
# URN format
u.urn           # 'urn:uuid:550e8400-e29b-41d4-a716-446655440000'
 
# Version and variant
u.version       # 4
u.variant       # 'specified in RFC 4122'

When to use each:

hex: When you need a string without dashes (shorter URLs, file names)
bytes: Binary protocols, efficient storage, cryptographic operations
int: Mathematical operations, bit manipulation
str(): Human-readable displays, JSON APIs

import uuid
 
u = uuid.uuid4()
 
# Compact storage in a binary file
with open('ids.bin', 'wb') as f:
    f.write(u.bytes)  # 16 bytes instead of 36 characters
 
# Shorter URL paths
print(f"/users/{u.hex}")  # No dashes

UUIDs in Databases

This is where I made my first real mistake. I stored UUIDs as strings. Don't do that.

The Right Way: Native UUID Columns

# SQLAlchemy with PostgreSQL
from sqlalchemy import Column
from sqlalchemy.dialects.postgresql import UUID
import uuid
 
class User(Base):
    __tablename__ = 'users'
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    name = Column(String)

PostgreSQL has a native UUID type. It stores 16 bytes, indexes efficiently, and validates format automatically.

The Wrong Way: VARCHAR

-- Don't do this
CREATE TABLE users (
    id VARCHAR(36) PRIMARY KEY,  -- Wastes space, slow indexes
    name VARCHAR(255)
);

VARCHAR(36) wastes space (36 bytes + overhead vs 16 bytes) and makes comparisons slower.

SQLite: Store as BLOB or Text

SQLite doesn't have a UUID type:

import sqlite3
import uuid
 
conn = sqlite3.connect(':memory:')
 
# Option 1: Store as 16-byte BLOB (efficient)
conn.execute('CREATE TABLE users (id BLOB PRIMARY KEY, name TEXT)')
user_id = uuid.uuid4()
conn.execute('INSERT INTO users VALUES (?, ?)', (user_id.bytes, 'Alice'))
 
# Option 2: Store as text (easier to debug)
conn.execute('CREATE TABLE posts (id TEXT PRIMARY KEY, title TEXT)')
post_id = uuid.uuid4()
conn.execute('INSERT INTO posts VALUES (?, ?)', (str(post_id), 'Hello'))

UUIDs in APIs

JSON APIs typically use UUID strings:

import uuid
import json
 
# Serializing
user = {
    'id': str(uuid.uuid4()),
    'name': 'Alice'
}
json.dumps(user)  # Works
 
# Won't work - UUID isn't JSON serializable by default
user = {
    'id': uuid.uuid4(),  # UUID object
    'name': 'Alice'
}
json.dumps(user)  # TypeError!

Custom JSON Encoder

import uuid
import json
 
class UUIDEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, uuid.UUID):
            return str(obj)
        return super().default(obj)
 
user = {'id': uuid.uuid4(), 'name': 'Alice'}
json.dumps(user, cls=UUIDEncoder)  # Works

Pydantic (My Preferred Approach)

from pydantic import BaseModel
import uuid
 
class User(BaseModel):
    id: uuid.UUID
    name: str
 
user = User(id=uuid.uuid4(), name='Alice')
print(user.model_dump_json())
# {"id":"550e8400-e29b-41d4-a716-446655440000","name":"Alice"}
 
# Parsing incoming JSON
data = '{"id": "550e8400-e29b-41d4-a716-446655440000", "name": "Bob"}'
user = User.model_validate_json(data)
print(type(user.id))  # <class 'uuid.UUID'>

Security Considerations

uuid1 Leaks Your MAC Address

This bears repeating. Look at what uuid1 reveals:

import uuid
 
u = uuid.uuid1()
 
# When it was created (100-nanosecond precision since 1582)
print(f"Time: {u.time}")
 
# Your machine's MAC address
print(f"Node (MAC): {u.node:012x}")  # e.g., "1a2b3c4d5e6f"

In a security incident, this is valuable intel:

Attackers can correlate UUIDs across your system
They know your hardware fingerprint
They can estimate when you created records

Always use uuid4 for user-facing or external IDs.

UUIDs Aren't Secrets

A UUID is unique, not secret. Anyone who sees it can use it:

# BAD: Password reset via UUID in URL
reset_url = f"https://example.com/reset/{uuid.uuid4()}"
# If someone intercepts this, they can reset the password
 
# BETTER: Use secrets module for security-sensitive tokens
import secrets
reset_token = secrets.token_urlsafe(32)

Use UUIDs for identity. Use the secrets module for security tokens.

UUID Collision Probability

A common concern: "What if two UUIDs collide?"

With uuid4, you'd need to generate about 2.71×10^18 UUIDs to have a 50% chance of collision. That's billions of IDs per second for 85 years.

In practice: don't worry about it. The sun will burn out first.

Common Patterns

Pattern 1: Auto-generated IDs with Dataclasses

from dataclasses import dataclass, field
import uuid
 
@dataclass
class User:
    name: str
    email: str
    id: uuid.UUID = field(default_factory=uuid.uuid4)
 
alice = User(name="Alice", email="alice@example.com")
print(alice.id)  # Auto-generated UUID

Pattern 2: Derived IDs

Generate child IDs deterministically from parent IDs:

import uuid
 
def child_id(parent: uuid.UUID, suffix: str) -> uuid.UUID:
    """Generate deterministic child ID."""
    return uuid.uuid5(parent, suffix)
 
user_id = uuid.uuid4()
profile_id = child_id(user_id, 'profile')
settings_id = child_id(user_id, 'settings')
 
# Same user always gets same profile/settings IDs

Pattern 3: Short URL-Safe IDs

UUIDs are long (36 chars). For URLs, you can shorten them:

import uuid
import base64
 
def uuid_to_short(u: uuid.UUID) -> str:
    """Convert UUID to 22-char URL-safe string."""
    return base64.urlsafe_b64encode(u.bytes).rstrip(b'=').decode()
 
def short_to_uuid(s: str) -> uuid.UUID:
    """Convert back to UUID."""
    padding = '=' * (4 - len(s) % 4)
    return uuid.UUID(bytes=base64.urlsafe_b64decode(s + padding))
 
u = uuid.uuid4()
short = uuid_to_short(u)
print(short)  # 'VQ6EAOKbQdSnFkRmVUQAAA' (22 chars)
print(short_to_uuid(short) == u)  # True

Pattern 4: Request Tracing

Add UUIDs to every request for debugging:

import uuid
import logging
 
def handle_request(request):
    request_id = str(uuid.uuid4())
    
    # Add to all log messages
    logger = logging.getLogger(__name__)
    logger.info(f"[{request_id}] Processing request")
    
    # Add to response headers
    response.headers['X-Request-ID'] = request_id
    
    return response

Quick Reference

import uuid
 
# Generate (choose one)
uuid.uuid4()                          # Random - default choice
uuid.uuid5(namespace, name)           # Deterministic from input
uuid.uuid1()                          # Time + MAC (privacy risk!)
 
# Parse
uuid.UUID('550e8400-e29b-41d4-...')   # From string
uuid.UUID(hex='550e8400...')          # From hex
uuid.UUID(bytes=b'...')               # From bytes
 
# Convert
str(u)                                # Standard string with dashes
u.hex                                 # No dashes
u.bytes                               # 16 bytes
u.int                                 # Integer
 
# Properties
u.version                             # 1, 3, 4, or 5
u.variant                             # Usually RFC_4122

What I Wish I Knew Earlier

Start with uuid4. It's the safe default.
Store natively. PostgreSQL's UUID type, not VARCHAR.
uuid1 leaks info. Avoid it unless you understand the tradeoff.
UUIDs aren't secrets. Use secrets module for tokens.
Collisions won't happen. Seriously, don't worry about it.

The uuid module is one of those stdlib gems that just works. Pick the right version, use native database types, and you'll rarely think about it again.

React to this post:

#What's a UUID?

#The Four UUID Versions (and When to Use Each)

#uuid4: Your Default Choice

#uuid5: When You Need Determinism

#uuid1: The Dangerous One

#uuid3: Legacy, Avoid It

#Creating UUIDs from Strings

#Validation Pattern

#UUID Attributes: hex, bytes, int

#UUIDs in Databases

#The Right Way: Native UUID Columns

#The Wrong Way: VARCHAR

#SQLite: Store as BLOB or Text

#UUIDs in APIs

#Custom JSON Encoder

#Pydantic (My Preferred Approach)

#Security Considerations

#uuid1 Leaks Your MAC Address

#UUIDs Aren't Secrets

#UUID Collision Probability

#Common Patterns

#Pattern 1: Auto-generated IDs with Dataclasses

#Pattern 2: Derived IDs

#Pattern 3: Short URL-Safe IDs

#Pattern 4: Request Tracing

#Quick Reference

#What I Wish I Knew Earlier

Keep Reading

Need help shipping fast?