The hashlib module provides secure hash functions. Use it for checksums, data integrity, and (with care) password hashing.
Basic Hashing
import hashlib
# Hash a string
text = "Hello, World!"
hash_obj = hashlib.sha256(text.encode())
# Get the digest
print(hash_obj.hexdigest())
# 'dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f'
print(hash_obj.digest()) # Raw bytesCommon Algorithms
# SHA-256 (recommended for most uses)
hashlib.sha256(data)
# SHA-512 (longer hash, more security margin)
hashlib.sha512(data)
# SHA-1 (legacy, avoid for security)
hashlib.sha1(data)
# MD5 (broken for security, fine for checksums)
hashlib.md5(data)
# List all available
print(hashlib.algorithms_available)File Checksums
For large files, read in chunks:
def file_hash(filepath: str, algorithm: str = 'sha256') -> str:
"""Calculate hash of a file."""
h = hashlib.new(algorithm)
with open(filepath, 'rb') as f:
while chunk := f.read(8192):
h.update(chunk)
return h.hexdigest()
# Usage
checksum = file_hash('large_file.iso')
print(checksum)Verify a download:
def verify_checksum(filepath: str, expected: str, algorithm: str = 'sha256') -> bool:
"""Verify file matches expected checksum."""
actual = file_hash(filepath, algorithm)
return actual.lower() == expected.lower()
# Usage
if verify_checksum('download.zip', 'abc123...'):
print("File integrity verified")
else:
print("Checksum mismatch!")Incremental Hashing
Build up a hash over multiple updates:
h = hashlib.sha256()
h.update(b"Hello, ")
h.update(b"World!")
print(h.hexdigest())
# Same as sha256(b"Hello, World!")Useful for streaming data:
import hashlib
def hash_stream(stream) -> str:
"""Hash data from any iterable."""
h = hashlib.sha256()
for chunk in stream:
h.update(chunk)
return h.hexdigest()Password Hashing (Don't Do This)
Plain hashing is not secure for passwords:
# BAD: vulnerable to rainbow tables
password_hash = hashlib.sha256(password.encode()).hexdigest()Use hashlib.pbkdf2_hmac or better, bcrypt/argon2:
import hashlib
import secrets
def hash_password(password: str) -> tuple[str, str]:
"""Hash password with PBKDF2 (stdlib option)."""
salt = secrets.token_hex(16)
key = hashlib.pbkdf2_hmac(
'sha256',
password.encode(),
salt.encode(),
iterations=100_000
)
return key.hex(), salt
def verify_password(password: str, key_hex: str, salt: str) -> bool:
"""Verify password against stored hash."""
new_key = hashlib.pbkdf2_hmac(
'sha256',
password.encode(),
salt.encode(),
iterations=100_000
)
return secrets.compare_digest(new_key.hex(), key_hex)Better: use bcrypt or argon2-cffi packages.
HMAC for Message Authentication
Verify both integrity and authenticity:
import hmac
import hashlib
def sign_message(message: bytes, secret: bytes) -> str:
"""Create HMAC signature."""
return hmac.new(secret, message, hashlib.sha256).hexdigest()
def verify_signature(message: bytes, signature: str, secret: bytes) -> bool:
"""Verify HMAC signature."""
expected = sign_message(message, secret)
return hmac.compare_digest(signature, expected)
# Usage
secret = b'my-secret-key'
message = b'{"user": "alice", "action": "transfer"}'
sig = sign_message(message, secret)
print(verify_signature(message, sig, secret)) # TrueContent Addressing
Use hashes as identifiers:
def content_address(data: bytes) -> str:
"""Generate content-based identifier."""
return hashlib.sha256(data).hexdigest()[:16]
# Same content always gets same address
addr1 = content_address(b"Hello")
addr2 = content_address(b"Hello")
assert addr1 == addr2Quick Reference
| Algorithm | Output Size | Use Case |
|---|---|---|
| MD5 | 128 bits | File checksums (non-security) |
| SHA-1 | 160 bits | Legacy compatibility only |
| SHA-256 | 256 bits | General purpose, recommended |
| SHA-512 | 512 bits | Extra security margin |
| BLAKE2b | Variable | Fast, secure alternative |
| Function | Purpose |
|---|---|
hashlib.sha256(data) | Create hash object |
hash.update(data) | Add more data |
hash.hexdigest() | Get hex string |
hash.digest() | Get raw bytes |
hashlib.pbkdf2_hmac() | Key derivation |
hashlib.new(name) | Dynamic algorithm selection |
When to Use What
| Task | Solution |
|---|---|
| File integrity | SHA-256 |
| Download verification | SHA-256/SHA-512 |
| Password storage | bcrypt or argon2 |
| API signatures | HMAC-SHA256 |
| Content addressing | SHA-256 |
| Quick deduplication | MD5 (speed over security) |
hashlib is foundational. Know when hashing alone is enough and when you need HMAC, KDFs, or specialized password hashing.
React to this post: