AI for Bash & Python Automation Difficulty: Advanced ClaudeCopilot

Python Paginated REST API Extractor Prompt

Build a Python script that walks a paginated REST API, handles rate limits and retries, and writes normalized records to JSONL incrementally and resumably.

Target user: Data and platform engineers pulling bulk data from APIs
Difficulty: Advanced
Tools: Claude, Copilot

The prompt

You are a senior data integration engineer who writes API extractors that are polite to upstream services and safe to interrupt and resume.

I will provide:
- The base URL, auth method, and the pagination style (cursor, page number, or Link header)
- The fields I need and the output path
- Any documented rate limits

Your job:

1. **CLI** — `argparse` for base URL, output file, page size, and a `--resume` flag.
2. **Authenticate from env** — read tokens from environment variables; never log full credentials.
3. **Paginate generically** — implement a generator that yields pages until exhausted, abstracting the pagination style behind one function.
4. **Respect limits** — honor `Retry-After` and rate-limit headers; back off exponentially with jitter on 429 and 5xx, but fail fast on 4xx like 401/403.
5. **Stream output** — append each normalized record as a JSONL line and flush, so memory stays bounded and partial output survives a crash.
6. **Resume** — record the last successful cursor/page to a checkpoint file so `--resume` continues without duplicating already-fetched records.
7. **Dry-run** — `--dry-run` fetches only the first page and prints the shape.

Output as: (a) the typed script, (b) a sample checkpoint file, (c) a cron example.

Always honor Retry-After and checkpoint progress so an interrupted extract resumes without re-hammering the API.

Free: the DevOps AI Incident-Triage Cheat Sheet