Files are everywhere. Config files, logs, data exports, caches. As a junior engineer, I thought file handling was simple—open(), write some bytes, done. Then I corrupted a config file during a crash and lost user data.

Here's everything I've learned about doing it right.

open() and Context Managers

The old way:

f = open("data.txt", "r")
content = f.read()
f.close()  # Easy to forget

The problem: if an exception happens before close(), your file handle leaks. Do this enough times and your program runs out of file descriptors.

The right way:

with open("data.txt", "r") as f:
    content = f.read()
# File is automatically closed, even if an exception occurs

The with statement is a context manager. It guarantees cleanup happens no matter what. I use it for every file operation now.

# Multiple files at once
with open("input.txt", "r") as src, open("output.txt", "w") as dst:
    dst.write(src.read())

File Modes

The second argument to open() is the mode. Here's what they mean:

# Reading
open("file.txt", "r")   # Read text (default)
open("file.txt", "rb")  # Read binary (images, PDFs, etc.)
 
# Writing
open("file.txt", "w")   # Write text (overwrites existing!)
open("file.txt", "wb")  # Write binary
open("file.txt", "a")   # Append (adds to end)
 
# Reading and writing
open("file.txt", "r+")  # Read and write (file must exist)
open("file.txt", "w+")  # Read and write (truncates file)
open("file.txt", "a+")  # Read and append

Critical lesson I learned the hard way: "w" mode truncates the file immediately when you open it. If your script crashes before writing new data, you've lost everything.

# Dangerous pattern
with open("config.json", "w") as f:
    # Script crashes here...
    f.write(json.dumps(config))  # Never runs
# config.json is now empty!

More on how to fix this later with atomic writes.

Reading Files

Three main methods, each with different use cases:

read() - Entire File

with open("data.txt", "r") as f:
    content = f.read()  # Entire file as one string

Good for small files. Bad for large files—it loads everything into memory.

# Read specific number of characters
with open("data.txt", "r") as f:
    first_100 = f.read(100)  # First 100 characters
    next_100 = f.read(100)   # Next 100 characters

readline() - One Line at a Time

with open("log.txt", "r") as f:
    first_line = f.readline()   # Includes \n
    second_line = f.readline()

Useful when you need specific lines or want manual control.

readlines() - All Lines as List

with open("data.txt", "r") as f:
    lines = f.readlines()  # ["line1\n", "line2\n", ...]
 
# Process lines
for line in lines:
    print(line.strip())  # Remove trailing newline

Loads entire file into memory. For large files, iterate instead.

Iterating (Memory-Efficient)

# Best for large files
with open("huge.log", "r") as f:
    for line in f:  # One line at a time, low memory
        process(line)

This is my default pattern now. Works for files of any size.

Writing Files

write() - Write a String

with open("output.txt", "w") as f:
    f.write("Hello, world!\n")
    f.write("Second line\n")

Note: write() doesn't add newlines automatically.

writelines() - Write Multiple Strings

lines = ["line 1\n", "line 2\n", "line 3\n"]
 
with open("output.txt", "w") as f:
    f.writelines(lines)

Despite the name, it doesn't add newlines. Include them yourself.

with open("output.txt", "w") as f:
    print("Hello", file=f)  # Includes newline
    print("World", file=f)

I sometimes use this when I want automatic newlines.

Encoding: The UTF-8 Story

This bug cost me hours:

with open("data.txt", "r") as f:
    content = f.read()  # UnicodeDecodeError!

The file had non-ASCII characters, and Python was using the wrong encoding.

Always specify encoding:

with open("data.txt", "r", encoding="utf-8") as f:
    content = f.read()

For writing:

with open("data.txt", "w", encoding="utf-8") as f:
    f.write("日本語テキスト")  # Works correctly

Handling Unknown Encodings

Sometimes you get files from unknown sources:

# Replace undecodable characters
with open("messy.txt", "r", encoding="utf-8", errors="replace") as f:
    content = f.read()  # Bad chars become �
 
# Ignore undecodable characters  
with open("messy.txt", "r", encoding="utf-8", errors="ignore") as f:
    content = f.read()  # Bad chars disappear

Detecting Encoding

import chardet
 
with open("unknown.txt", "rb") as f:
    raw = f.read()
    result = chardet.detect(raw)
    encoding = result["encoding"]
 
with open("unknown.txt", "r", encoding=encoding) as f:
    content = f.read()

Install with pip install chardet.

Atomic Writes: Don't Lose Data

Remember that dangerous pattern? Here's the fix:

import tempfile
import os
 
def write_atomic(path, content):
    """Write to a temp file, then rename. Either succeeds completely or fails completely."""
    dir_name = os.path.dirname(path) or "."
    
    # Write to temporary file in same directory
    with tempfile.NamedTemporaryFile(
        mode="w",
        dir=dir_name,
        delete=False,
        encoding="utf-8"
    ) as tmp:
        tmp.write(content)
        tmp_path = tmp.name
    
    # Atomic rename
    os.replace(tmp_path, path)

Why this works:

  1. Write goes to a temp file first
  2. os.replace() is atomic on most filesystems
  3. If crash happens during write, original file is untouched
  4. If crash happens during rename, temp file still has complete data

For JSON:

import json
 
def save_json_atomic(path, data):
    content = json.dumps(data, indent=2, ensure_ascii=False)
    write_atomic(path, content)

This pattern has saved me multiple times in production.

Working with Large Files

Loading a 10GB log file into memory? Bad idea.

Line-by-Line Processing

def process_large_file(path):
    with open(path, "r", encoding="utf-8") as f:
        for line in f:
            yield process_line(line)

Memory usage stays constant regardless of file size.

Chunked Reading for Binary

def process_binary_chunks(path, chunk_size=8192):
    with open(path, "rb") as f:
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            yield chunk

Counting Lines Efficiently

def count_lines(path):
    count = 0
    with open(path, "rb") as f:  # Binary is faster
        for _ in f:
            count += 1
    return count

Searching Without Loading

def find_line_containing(path, pattern):
    with open(path, "r", encoding="utf-8") as f:
        for line_num, line in enumerate(f, 1):
            if pattern in line:
                return line_num, line
    return None

Common Patterns

Check if File Exists

from pathlib import Path
 
path = Path("config.json")
if path.exists():
    content = path.read_text(encoding="utf-8")

Read JSON

import json
from pathlib import Path
 
def load_json(path):
    return json.loads(Path(path).read_text(encoding="utf-8"))
 
def save_json(path, data):
    Path(path).write_text(
        json.dumps(data, indent=2, ensure_ascii=False),
        encoding="utf-8"
    )

Read YAML

import yaml
from pathlib import Path
 
def load_yaml(path):
    return yaml.safe_load(Path(path).read_text(encoding="utf-8"))

Read CSV

import csv
 
def read_csv(path):
    with open(path, "r", encoding="utf-8", newline="") as f:
        reader = csv.DictReader(f)
        return list(reader)

File as Configuration

from pathlib import Path
import json
 
CONFIG_PATH = Path("~/.myapp/config.json").expanduser()
 
def load_config():
    if CONFIG_PATH.exists():
        return json.loads(CONFIG_PATH.read_text(encoding="utf-8"))
    return {}
 
def save_config(config):
    CONFIG_PATH.parent.mkdir(parents=True, exist_ok=True)
    CONFIG_PATH.write_text(
        json.dumps(config, indent=2),
        encoding="utf-8"
    )

Temporary Files

import tempfile
 
# Auto-deleted when closed
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=True) as f:
    f.write("temporary data")
    f.flush()
    # Use f.name to get the path
    process_file(f.name)
# File is automatically deleted

Lock Files (Simple)

from pathlib import Path
import os
 
def with_lock(path):
    lock_path = Path(str(path) + ".lock")
    
    # Check for existing lock
    if lock_path.exists():
        raise RuntimeError("File is locked")
    
    try:
        # Create lock
        lock_path.touch()
        yield
    finally:
        # Release lock
        lock_path.unlink(missing_ok=True)

My Guidelines

  1. Always use with - never manual close
  2. Always specify encoding - utf-8 is almost always right
  3. Use atomic writes for important data - config, state, anything you can't lose
  4. Iterate for large files - don't load into memory
  5. Use pathlib - it's cleaner than os.path
  6. Handle missing files gracefully - check exists or catch FileNotFoundError

File handling isn't glamorous, but getting it right prevents real pain. One corrupted config file taught me more than any tutorial.


Have a file handling horror story? I'd love to hear it. Still learning here.

React to this post: