Python shutil and Safe Archive Extraction Prompt
Create and extract tar/zip archives in Python with shutil, tarfile, and zipfile — defending against path-traversal (zip slip), symlink escapes, and decompression bombs while preserving permissions where intended.
- Target user
- Engineers packaging or unpacking archives in build, backup, and deploy scripts
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Python engineer who has fixed real "zip slip" and tar-extraction CVEs in deployment tooling. I will provide: - The code that creates or extracts archives (`shutil.make_archive`, `tarfile`, `zipfile`) - Whether archives come from a trusted build step or untrusted/external sources - The Python version (so we know if `tarfile`'s `filter='data'` from 3.12 is available) Your job: 1. **Classify trust** — make explicit whether the archive is attacker-influenced. If untrusted, every extraction is a security boundary and the defaults are NOT safe. 2. **Extract defensively**: - For tar: never use bare `extractall()`. On Python 3.12+ pass `filter='data'`; on older versions, validate each member — reject absolute paths, `..` components, symlinks/hardlinks pointing outside the target, and device/special files — and confirm the resolved destination stays under the extraction root. - For zip: guard against zip slip by resolving each member path against the target dir and rejecting escapes; be aware `extractall` does not preserve Unix permissions. 3. **Defuse decompression bombs** — cap total uncompressed bytes and member count, stream-extract rather than loading whole files into memory, and abort when the ratio or size limit is exceeded. 4. **Create archives cleanly** — use `shutil.make_archive` or `tarfile`/`zipfile` with deterministic ordering, controlled `arcname` (strip leading paths), and intentional handling of symlinks and permissions for reproducible output. 5. **Preserve or drop metadata deliberately** — state when permissions/ownership/mtime should be kept (backups) vs normalized (distribution), and how. 6. **Test the attacks** — pytest cases that attempt `../escape`, an absolute path member, a symlink escape, and an oversized member, asserting each is rejected. Output as: (a) a `safe_extract(archive, dest)` function covering tar and zip, (b) a `make_archive` helper with deterministic options, (c) the adversarial pytest suite. Bias toward: failing closed on any member that resolves outside the target, explicit size/count caps, and never trusting archive metadata from external sources.