When I first discovered Python's dataclasses module, I thought it was just a shortcut for writing __init__ and __repr__. Then I tried to do something slightly complex and realized I'd only scratched the surface.

This is what I wish I'd known earlier.

The Basics (Quick Recap)

from dataclasses import dataclass
 
@dataclass
class User:
    name: str
    email: str
    age: int = 0

This generates __init__, __repr__, __eq__, and more. Nice. But the real power is in what comes next.

Frozen: True Immutability

I was confused when I first saw code that mutated dataclass instances in ways I didn't expect. The problem? Mutable default values and accidental mutations.

Enter frozen=True:

from dataclasses import dataclass
 
@dataclass(frozen=True)
class Config:
    api_key: str
    timeout: int = 30
    retries: int = 3
 
config = Config(api_key="secret123")
config.timeout = 60  # FrozenInstanceError!

This makes your dataclass immutable—any attempt to modify it raises an exception.

What I learned: Frozen dataclasses are perfect for configuration objects, value objects, and anything that should never change after creation. They're also hashable by default, which means you can use them as dictionary keys or in sets.

# This works because frozen=True makes it hashable
configs = {Config(api_key="prod"), Config(api_key="staging")}

The Catch: Nested Mutability

Here's something that tripped me up:

@dataclass(frozen=True)
class Settings:
    name: str
    options: list  # Uh oh
 
settings = Settings(name="app", options=[1, 2, 3])
settings.options.append(4)  # This works! The list is still mutable

Frozen only prevents reassignment of the field itself, not mutation of mutable objects inside. For true deep immutability, use tuples or frozen collections.

field() Options: Fine-Grained Control

The field() function is where dataclasses get flexible. I ignored it for too long.

default_factory: Avoiding the Mutable Default Trap

Every Python developer learns this lesson eventually:

# WRONG - all instances share the same list!
@dataclass
class BadTask:
    name: str
    tags: list = []  # TypeError anyway, but you get the idea
 
# RIGHT - each instance gets its own list
from dataclasses import dataclass, field
 
@dataclass
class Task:
    name: str
    tags: list = field(default_factory=list)

default_factory takes a callable that produces a fresh default value for each instance.

# More complex factories
from uuid import uuid4
from datetime import datetime
 
@dataclass
class Event:
    name: str
    id: str = field(default_factory=lambda: str(uuid4()))
    created_at: datetime = field(default_factory=datetime.now)

repr, compare, hash: Controlling Behavior

Sometimes you don't want a field to show up in the repr or affect equality:

@dataclass
class Document:
    title: str
    content: str
    # Internal tracking - don't show in repr or use in comparison
    _cache: dict = field(default_factory=dict, repr=False, compare=False)
 
doc1 = Document("Hello", "World")
doc2 = Document("Hello", "World")
doc1._cache["key"] = "value"
 
print(doc1)  # Document(title='Hello', content='World')
print(doc1 == doc2)  # True - _cache is ignored

What I learned: These options are invaluable for:

  • repr=False: Hiding sensitive data or internal state
  • compare=False: Excluding metadata from equality checks
  • hash=False: Excluding fields from hash computation

post_init: Validation and Computed Fields

This is where I had my "aha" moment with dataclasses. __post_init__ runs after the auto-generated __init__:

@dataclass
class Rectangle:
    width: float
    height: float
    area: float = field(init=False)  # Computed, not passed to __init__
    
    def __post_init__(self):
        if self.width <= 0 or self.height <= 0:
            raise ValueError("Dimensions must be positive")
        self.area = self.width * self.height
 
rect = Rectangle(10, 5)
print(rect.area)  # 50.0
 
Rectangle(-1, 5)  # ValueError: Dimensions must be positive

Validation Patterns

import re
 
@dataclass
class Email:
    address: str
    
    def __post_init__(self):
        pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'
        if not re.match(pattern, self.address):
            raise ValueError(f"Invalid email: {self.address}")
 
Email("user@example.com")  # Works
Email("not-an-email")  # ValueError

Transforming Input

@dataclass
class User:
    name: str
    email: str
    
    def __post_init__(self):
        self.name = self.name.strip().title()
        self.email = self.email.strip().lower()
 
user = User("  john doe  ", "JOHN@EXAMPLE.COM")
print(user)  # User(name='John Doe', email='john@example.com')

Inheritance: Patterns and Gotchas

Dataclass inheritance works, but there are traps.

The Field Order Problem

This broke my code more than once:

@dataclass
class Animal:
    name: str
    species: str = "Unknown"  # Has default
 
@dataclass
class Dog(Animal):
    breed: str  # No default - ERROR!

This raises TypeError: non-default argument 'breed' follows default argument. Parent defaults "pollute" child field ordering.

Solution 1: Give all child fields defaults

@dataclass
class Dog(Animal):
    breed: str = "Unknown"

Solution 2: Use field with default_factory

@dataclass
class Dog(Animal):
    breed: str = field(default="Unknown")

Solution 3: Rethink your hierarchy

Sometimes inheritance isn't the right tool. Composition might be cleaner.

Method Inheritance Works Fine

@dataclass
class BaseModel:
    id: int
    
    def save(self):
        print(f"Saving {self.__class__.__name__} with id={self.id}")
 
@dataclass
class Product(BaseModel):
    name: str
    price: float
 
product = Product(1, "Widget", 9.99)
product.save()  # "Saving Product with id=1"

Slots: Memory Optimization (Python 3.10+)

I was skeptical until I actually measured this.

@dataclass(slots=True)
class Point:
    x: float
    y: float

With slots=True, Python uses __slots__ instead of a __dict__ for instance attributes. Benefits:

  1. Less memory - no per-instance dictionary
  2. Faster attribute access - direct offset lookup
  3. Prevents accidental attribute creation
@dataclass(slots=True)
class SlottedPoint:
    x: float
    y: float
 
p = SlottedPoint(1.0, 2.0)
p.z = 3.0  # AttributeError: 'SlottedPoint' object has no attribute 'z'

Measuring the Difference

import sys
from dataclasses import dataclass
 
@dataclass
class RegularPoint:
    x: float
    y: float
 
@dataclass(slots=True)
class SlottedPoint:
    x: float
    y: float
 
regular = RegularPoint(1.0, 2.0)
slotted = SlottedPoint(1.0, 2.0)
 
print(sys.getsizeof(regular))  # 48 bytes
print(sys.getsizeof(slotted))  # 32 bytes (rough numbers, varies by Python version)

For a million instances, that's meaningful memory savings.

What I learned: Use slots=True when you're creating many instances and don't need dynamic attribute assignment.

Typing Integration: ClassVar and InitVar

ClassVar: Class-Level Constants

Fields that belong to the class, not instances:

from dataclasses import dataclass
from typing import ClassVar
 
@dataclass
class Counter:
    name: str
    count: int = 0
    total_created: ClassVar[int] = 0  # Not an instance field
    
    def __post_init__(self):
        Counter.total_created += 1
 
c1 = Counter("first")
c2 = Counter("second")
print(Counter.total_created)  # 2
print(c1.total_created)  # 2 (accessed via class)

ClassVar fields don't appear in __init__, aren't compared, and don't show in repr.

InitVar: Init-Only Variables

Pass data to __post_init__ without storing it:

from dataclasses import dataclass, field, InitVar
 
@dataclass
class Password:
    hash: str = field(init=False)
    raw_password: InitVar[str]
    
    def __post_init__(self, raw_password: str):
        # raw_password is NOT stored as an attribute
        self.hash = self._hash(raw_password)
    
    def _hash(self, password: str) -> str:
        import hashlib
        return hashlib.sha256(password.encode()).hexdigest()
 
pw = Password(raw_password="secret123")
print(pw.hash)  # "e5e9fa1ba31e..."
print(hasattr(pw, 'raw_password'))  # False - it's not stored!

What I learned: InitVar is perfect for sensitive data you need to process but shouldn't persist on the instance.

When to Use Alternatives

Dataclasses are great, but not always the right choice.

namedtuple: Simple, Immutable, Iterable

from collections import namedtuple
 
Point = namedtuple('Point', ['x', 'y'])
p = Point(1, 2)
x, y = p  # Unpacking works!
print(p[0])  # Indexing works!

Use namedtuple when:

  • You need tuple-like behavior (iteration, unpacking, indexing)
  • Simple immutable data without methods
  • Memory efficiency matters (even lighter than slotted dataclasses)

attrs: More Features, More Control

import attr
 
@attr.s(auto_attribs=True)
class User:
    name: str
    email: str = attr.ib(validator=attr.validators.matches_re(r'.+@.+'))

Use attrs when:

  • You need built-in validators
  • You want converters (auto-transform input)
  • You need more sophisticated inheritance
  • You're on Python < 3.7 (attrs predates dataclasses)

Pydantic: Runtime Validation, JSON Ready

from pydantic import BaseModel, EmailStr
 
class User(BaseModel):
    name: str
    email: EmailStr  # Validates email format
    age: int
 
# Automatic parsing and validation
user = User(name="John", email="john@example.com", age="25")
print(user.age)  # 25 (int, auto-converted)
print(user.model_dump_json())  # JSON serialization built-in

Use Pydantic when:

  • Parsing external data (APIs, config files, user input)
  • You need JSON serialization/deserialization
  • Complex validation is core to your use case
  • You're building FastAPI applications

My Decision Framework

Need immutability only? → namedtuple or frozen dataclass
Internal domain objects? → dataclass
External data validation? → Pydantic
Complex validation + Python 2 support? → attrs

Common Patterns

The Builder Pattern

For complex objects with many optional parameters:

@dataclass
class QueryBuilder:
    table: str
    columns: list = field(default_factory=lambda: ["*"])
    where: list = field(default_factory=list)
    limit: int | None = None
    
    def select(self, *cols):
        return QueryBuilder(
            table=self.table,
            columns=list(cols),
            where=self.where.copy(),
            limit=self.limit
        )
    
    def filter(self, condition):
        return QueryBuilder(
            table=self.table,
            columns=self.columns.copy(),
            where=self.where + [condition],
            limit=self.limit
        )
    
    def take(self, n):
        return QueryBuilder(
            table=self.table,
            columns=self.columns.copy(),
            where=self.where.copy(),
            limit=n
        )
    
    def build(self) -> str:
        cols = ", ".join(self.columns)
        sql = f"SELECT {cols} FROM {self.table}"
        if self.where:
            sql += " WHERE " + " AND ".join(self.where)
        if self.limit:
            sql += f" LIMIT {self.limit}"
        return sql
 
query = (QueryBuilder("users")
    .select("id", "name", "email")
    .filter("active = true")
    .filter("age > 18")
    .take(10)
    .build())
# "SELECT id, name, email FROM users WHERE active = true AND age > 18 LIMIT 10"

Config Objects

from dataclasses import dataclass, field
from typing import ClassVar
import os
 
@dataclass(frozen=True)
class AppConfig:
    ENV_PREFIX: ClassVar[str] = "APP_"
    
    database_url: str
    debug: bool = False
    max_connections: int = 10
    
    @classmethod
    def from_env(cls):
        return cls(
            database_url=os.environ[f"{cls.ENV_PREFIX}DATABASE_URL"],
            debug=os.environ.get(f"{cls.ENV_PREFIX}DEBUG", "").lower() == "true",
            max_connections=int(os.environ.get(f"{cls.ENV_PREFIX}MAX_CONNECTIONS", 10))
        )

Data Transfer Objects (DTOs)

from dataclasses import dataclass, asdict, field
from datetime import datetime
 
@dataclass
class UserDTO:
    id: int
    name: str
    email: str
    created_at: datetime = field(default_factory=datetime.now)
    
    def to_dict(self) -> dict:
        data = asdict(self)
        data['created_at'] = self.created_at.isoformat()
        return data
    
    @classmethod
    def from_dict(cls, data: dict) -> 'UserDTO':
        if isinstance(data.get('created_at'), str):
            data['created_at'] = datetime.fromisoformat(data['created_at'])
        return cls(**data)
 
# Serialize
user = UserDTO(1, "John", "john@example.com")
payload = user.to_dict()  # Ready for JSON
 
# Deserialize
user_copy = UserDTO.from_dict(payload)

What I Learned (Summary)

  1. frozen=True creates true immutability and enables hashing—use it for configs and value objects
  2. field() gives you fine-grained control over defaults, representation, and comparison
  3. __post_init__ is your hook for validation and computed fields—use it liberally
  4. Inheritance has footguns—watch the field ordering and consider composition
  5. slots=True (3.10+) saves memory and prevents attribute typos
  6. ClassVar and InitVar solve specific typing needs—class-level and init-only data
  7. Know when to reach for alternatives—Pydantic for validation, attrs for features, namedtuple for simplicity

Dataclasses aren't magic. They're just Python generating boilerplate for you. But understanding the options turns them from a nice convenience into a powerful tool for clean, maintainable code.


Questions or patterns I missed? I'm always learning—reach out if you've got tricks I should know about.

React to this post: