Files are everywhere. Config files, logs, data exports, caches. As a junior engineer, I thought file handling was simple—open(), write some bytes, done. Then I corrupted a config file during a crash and lost user data.
Here's everything I've learned about doing it right.
open() and Context Managers
The old way:
f = open("data.txt", "r")
content = f.read()
f.close() # Easy to forgetThe problem: if an exception happens before close(), your file handle leaks. Do this enough times and your program runs out of file descriptors.
The right way:
with open("data.txt", "r") as f:
content = f.read()
# File is automatically closed, even if an exception occursThe with statement is a context manager. It guarantees cleanup happens no matter what. I use it for every file operation now.
# Multiple files at once
with open("input.txt", "r") as src, open("output.txt", "w") as dst:
dst.write(src.read())File Modes
The second argument to open() is the mode. Here's what they mean:
# Reading
open("file.txt", "r") # Read text (default)
open("file.txt", "rb") # Read binary (images, PDFs, etc.)
# Writing
open("file.txt", "w") # Write text (overwrites existing!)
open("file.txt", "wb") # Write binary
open("file.txt", "a") # Append (adds to end)
# Reading and writing
open("file.txt", "r+") # Read and write (file must exist)
open("file.txt", "w+") # Read and write (truncates file)
open("file.txt", "a+") # Read and appendCritical lesson I learned the hard way: "w" mode truncates the file immediately when you open it. If your script crashes before writing new data, you've lost everything.
# Dangerous pattern
with open("config.json", "w") as f:
# Script crashes here...
f.write(json.dumps(config)) # Never runs
# config.json is now empty!More on how to fix this later with atomic writes.
Reading Files
Three main methods, each with different use cases:
read() - Entire File
with open("data.txt", "r") as f:
content = f.read() # Entire file as one stringGood for small files. Bad for large files—it loads everything into memory.
# Read specific number of characters
with open("data.txt", "r") as f:
first_100 = f.read(100) # First 100 characters
next_100 = f.read(100) # Next 100 charactersreadline() - One Line at a Time
with open("log.txt", "r") as f:
first_line = f.readline() # Includes \n
second_line = f.readline()Useful when you need specific lines or want manual control.
readlines() - All Lines as List
with open("data.txt", "r") as f:
lines = f.readlines() # ["line1\n", "line2\n", ...]
# Process lines
for line in lines:
print(line.strip()) # Remove trailing newlineLoads entire file into memory. For large files, iterate instead.
Iterating (Memory-Efficient)
# Best for large files
with open("huge.log", "r") as f:
for line in f: # One line at a time, low memory
process(line)This is my default pattern now. Works for files of any size.
Writing Files
write() - Write a String
with open("output.txt", "w") as f:
f.write("Hello, world!\n")
f.write("Second line\n")Note: write() doesn't add newlines automatically.
writelines() - Write Multiple Strings
lines = ["line 1\n", "line 2\n", "line 3\n"]
with open("output.txt", "w") as f:
f.writelines(lines)Despite the name, it doesn't add newlines. Include them yourself.
print() to a File
with open("output.txt", "w") as f:
print("Hello", file=f) # Includes newline
print("World", file=f)I sometimes use this when I want automatic newlines.
Encoding: The UTF-8 Story
This bug cost me hours:
with open("data.txt", "r") as f:
content = f.read() # UnicodeDecodeError!The file had non-ASCII characters, and Python was using the wrong encoding.
Always specify encoding:
with open("data.txt", "r", encoding="utf-8") as f:
content = f.read()For writing:
with open("data.txt", "w", encoding="utf-8") as f:
f.write("日本語テキスト") # Works correctlyHandling Unknown Encodings
Sometimes you get files from unknown sources:
# Replace undecodable characters
with open("messy.txt", "r", encoding="utf-8", errors="replace") as f:
content = f.read() # Bad chars become �
# Ignore undecodable characters
with open("messy.txt", "r", encoding="utf-8", errors="ignore") as f:
content = f.read() # Bad chars disappearDetecting Encoding
import chardet
with open("unknown.txt", "rb") as f:
raw = f.read()
result = chardet.detect(raw)
encoding = result["encoding"]
with open("unknown.txt", "r", encoding=encoding) as f:
content = f.read()Install with pip install chardet.
Atomic Writes: Don't Lose Data
Remember that dangerous pattern? Here's the fix:
import tempfile
import os
def write_atomic(path, content):
"""Write to a temp file, then rename. Either succeeds completely or fails completely."""
dir_name = os.path.dirname(path) or "."
# Write to temporary file in same directory
with tempfile.NamedTemporaryFile(
mode="w",
dir=dir_name,
delete=False,
encoding="utf-8"
) as tmp:
tmp.write(content)
tmp_path = tmp.name
# Atomic rename
os.replace(tmp_path, path)Why this works:
- Write goes to a temp file first
os.replace()is atomic on most filesystems- If crash happens during write, original file is untouched
- If crash happens during rename, temp file still has complete data
For JSON:
import json
def save_json_atomic(path, data):
content = json.dumps(data, indent=2, ensure_ascii=False)
write_atomic(path, content)This pattern has saved me multiple times in production.
Working with Large Files
Loading a 10GB log file into memory? Bad idea.
Line-by-Line Processing
def process_large_file(path):
with open(path, "r", encoding="utf-8") as f:
for line in f:
yield process_line(line)Memory usage stays constant regardless of file size.
Chunked Reading for Binary
def process_binary_chunks(path, chunk_size=8192):
with open(path, "rb") as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
yield chunkCounting Lines Efficiently
def count_lines(path):
count = 0
with open(path, "rb") as f: # Binary is faster
for _ in f:
count += 1
return countSearching Without Loading
def find_line_containing(path, pattern):
with open(path, "r", encoding="utf-8") as f:
for line_num, line in enumerate(f, 1):
if pattern in line:
return line_num, line
return NoneCommon Patterns
Check if File Exists
from pathlib import Path
path = Path("config.json")
if path.exists():
content = path.read_text(encoding="utf-8")Read JSON
import json
from pathlib import Path
def load_json(path):
return json.loads(Path(path).read_text(encoding="utf-8"))
def save_json(path, data):
Path(path).write_text(
json.dumps(data, indent=2, ensure_ascii=False),
encoding="utf-8"
)Read YAML
import yaml
from pathlib import Path
def load_yaml(path):
return yaml.safe_load(Path(path).read_text(encoding="utf-8"))Read CSV
import csv
def read_csv(path):
with open(path, "r", encoding="utf-8", newline="") as f:
reader = csv.DictReader(f)
return list(reader)File as Configuration
from pathlib import Path
import json
CONFIG_PATH = Path("~/.myapp/config.json").expanduser()
def load_config():
if CONFIG_PATH.exists():
return json.loads(CONFIG_PATH.read_text(encoding="utf-8"))
return {}
def save_config(config):
CONFIG_PATH.parent.mkdir(parents=True, exist_ok=True)
CONFIG_PATH.write_text(
json.dumps(config, indent=2),
encoding="utf-8"
)Temporary Files
import tempfile
# Auto-deleted when closed
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=True) as f:
f.write("temporary data")
f.flush()
# Use f.name to get the path
process_file(f.name)
# File is automatically deletedLock Files (Simple)
from pathlib import Path
import os
def with_lock(path):
lock_path = Path(str(path) + ".lock")
# Check for existing lock
if lock_path.exists():
raise RuntimeError("File is locked")
try:
# Create lock
lock_path.touch()
yield
finally:
# Release lock
lock_path.unlink(missing_ok=True)My Guidelines
- Always use
with- never manual close - Always specify encoding -
utf-8is almost always right - Use atomic writes for important data - config, state, anything you can't lose
- Iterate for large files - don't load into memory
- Use pathlib - it's cleaner than os.path
- Handle missing files gracefully - check exists or catch FileNotFoundError
File handling isn't glamorous, but getting it right prevents real pain. One corrupted config file taught me more than any tutorial.
Have a file handling horror story? I'd love to hear it. Still learning here.