Good logs save you at 3 AM. Bad logs make 3 AM worse. Here's how to log effectively.
Use Log Levels Correctly
DEBUG: Verbose details for development. Never in production.
logger.debug(f"Cache lookup for key={key}, hit={hit}")INFO: Normal operations worth recording.
logger.info(f"User {user_id} logged in")WARNING: Something unexpected but recoverable.
logger.warning(f"Retry attempt {attempt}/3 for API call")ERROR: Something failed and needs attention.
logger.error(f"Payment failed for order {order_id}", exc_info=True)CRITICAL: System is broken, wake someone up.
logger.critical("Database connection pool exhausted")Structure Your Logs
Plain text is hard to search. Use structured logging:
# Bad
logger.info(f"User {user_id} purchased {product} for ${amount}")
# Good
logger.info(
"Purchase completed",
extra={
"user_id": user_id,
"product_id": product.id,
"amount": amount,
"currency": "USD",
}
)Structured logs become queryable data:
SELECT * FROM logs WHERE user_id = '123' AND amount > 100What to Log
Log at boundaries:
- Incoming requests
- Outgoing API calls
- Database queries (in debug)
- Background job starts/completions
Log decisions:
if user.is_premium:
logger.info("Applying premium discount", extra={"user_id": user.id})
apply_discount()Log failures with context:
except PaymentError as e:
logger.error(
"Payment processing failed",
extra={
"order_id": order.id,
"amount": order.total,
"error_code": e.code,
},
exc_info=True,
)What Not to Log
Secrets:
# NEVER
logger.info(f"Authenticating with password={password}")
logger.info(f"API key: {api_key}")High-volume noise:
# Don't log inside tight loops
for item in million_items:
logger.debug(f"Processing {item}") # Million log linesPersonally identifiable information (PII):
# Bad
logger.info(f"User email: {user.email}, SSN: {user.ssn}")
# Good
logger.info(f"User created", extra={"user_id": user.id})Add Request Context
Trace requests across your system:
import uuid
from contextvars import ContextVar
request_id: ContextVar[str] = ContextVar("request_id")
# Middleware
def add_request_id(request):
request_id.set(str(uuid.uuid4()))
# In your logger
class RequestIdFilter(logging.Filter):
def filter(self, record):
record.request_id = request_id.get("")
return TrueNow every log line includes request_id. Follow one request through the entire system.
Log Configuration
import logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s %(name)s %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
# Quiet noisy libraries
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("sqlalchemy").setLevel(logging.WARNING)Production vs Development
Development: DEBUG level, console output, readable format.
Production: INFO level, JSON format, shipped to log aggregator.
if os.getenv("ENV") == "production":
handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logger.addHandler(handler)Log Aggregation
Logs on disk don't help when you have 50 servers. Use a log aggregator:
- ELK Stack (Elasticsearch, Logstash, Kibana)
- Datadog
- Papertrail
- CloudWatch Logs
Search across all logs. Set up alerts for error patterns.
My Rules
- Log levels matter. Use them consistently.
- Structure everything. JSON beats plain text.
- Add context. Request ID, user ID, relevant data.
- No secrets. Ever.
- Test your logs. If something fails, do your logs tell you why?
When production breaks, logs are your debugger. Invest in them.