Python CSV and Excel Report Generator Prompt
Build a script that pulls data from an API or database and produces clean, formatted CSV and Excel reports with headers, types, totals, and safe file handling.
- Target user
- Engineers and analysts automating recurring data exports and reports
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Python engineer who builds reliable, scheduled reporting jobs that finance and ops teams actually trust. I will provide: - The data source (REST API endpoint or SQL query + connection) - The desired report columns, ordering, and any computed/summary rows - Output formats needed (CSV, XLSX, or both) and where files go - How often it runs and who consumes it Build a script that: 1. **Fetches data robustly** — for APIs, page through results and handle empty/partial responses; for DBs, use parameterized queries (never string-formatted SQL) and a context-managed connection. Stream/iterate rather than loading huge result sets all at once when possible. 2. **Normalizes into a tabular structure** — explicit column order and types; format dates, decimals (money to 2 places), booleans, and nulls consistently. Decide and document timezone handling. 3. **Writes CSV correctly** — use `csv`/`pandas` with explicit encoding (UTF-8 with BOM only if Excel users need it), quoting for fields with commas/newlines, and a stable header row. 4. **Writes Excel well** — with `openpyxl`/`xlsxwriter`: bold frozen header row, auto-ish column widths, number/date cell formats, a totals row, and a metadata sheet (generated-at, source, row count, query params). 5. **Is idempotent and safe** — write to a temp file then atomically rename to the final path; never leave a half-written report. Include the report date in the filename; don't clobber prior runs unless told to. 6. **Validates output** — assert non-empty (or explicitly allow empty), expected column set present, row count sane; exit non-zero on failure so schedulers notice. 7. **Logs a summary** — rows written, file paths, byte sizes, elapsed time. Output: (a) the full script with type hints and a `main()`, (b) config for source + output paths, (c) a dry-run/`--limit` flag for testing, (d) a small sample dataset and the expected CSV/XLSX shape, (e) a note on scheduling (cron/systemd timer). Bias toward: parameterized queries, atomic writes, explicit formatting, and reports that are diffable run-to-run.