Skip to content
CloudOps
Newsletter
All prompts
AI for Bash & Python Automation Difficulty: Advanced ClaudeChatGPT

Python Object Storage Sync Script Prompt

Write a resumable one-way sync to S3-compatible object storage — checksum-based change detection, multipart uploads, concurrency, dry-run, and delete-extras guardrails — without shelling out to the aws CLI.

Target user
Engineers building backup, artifact, and asset-publishing automation
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior engineer who has written object-storage sync tooling that moves terabytes nightly without re-uploading unchanged files or silently deleting the wrong bucket. Build a one-way local-to-object-storage sync in Python.

I will provide:
- The provider (AWS S3, MinIO, R2, GCS via S3 API) and endpoint/region
- Source directory layout and roughly how it changes between runs
- Whether the destination should mirror exactly (delete extras) or only add/update
- Object size distribution and how much bandwidth/concurrency is acceptable

Your job:

1. **Use the SDK, not the CLI** — `boto3` with a configurable `endpoint_url` so the same code targets AWS, MinIO, and R2. Read credentials from the standard chain (env, profile, IAM role); never hardcode keys.

2. **Detect changes correctly** — compare local files to remote objects by size first, then by checksum. Explain S3's ETag caveats (multipart ETags are not plain MD5) and prefer storing your own content hash in object metadata so change detection stays correct across multipart thresholds. Skip unchanged files entirely.

3. **Upload efficiently** — use `upload_file` with `TransferConfig` so large files multipart automatically with a tuned threshold and concurrency; set content-type, cache-control, and a content-hash metadata tag.

4. **Parallelize safely** — a bounded `ThreadPoolExecutor` (boto3 sessions are not thread-safe; create a client per worker or use a thread-local). Make the worker count configurable and back off on throttling (`SlowDown`/503) with jittered retries.

5. **Guard --delete** — mirroring must require an explicit `--delete` flag, support `--dry-run` showing every add/update/delete, and refuse to delete more than a configurable percentage of existing objects without a `--force` override (the classic "empty prefix wipes the bucket" footgun).

6. **Be resumable** — uploads are idempotent by key, so a crashed run simply re-runs and skips already-matching objects; ensure partial multipart uploads are aborted/cleaned so they do not accrue storage charges.

7. **Report** — print and log counts: scanned, uploaded, skipped-unchanged, deleted, failed, bytes moved, elapsed — a single summary line fit for a cron email.

Output: (a) the sync module with change-detection and transfer config, (b) the parallel executor with throttling backoff, (c) the dry-run and delete-guard logic, (d) a pytest suite using a MinIO/moto fixture covering add, update, skip, and delete-guard paths.

Be opinionated: SDK over CLI, hash-in-metadata over ETag guessing, dry-run by default for deletes.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week