Python ThreadPoolExecutor I/O Fan-Out Prompt
Parallelize I/O-bound work — HTTP requests, file reads, shell-outs, DB queries — with concurrent.futures.ThreadPoolExecutor, with bounded concurrency, ordered results, error isolation, and clean cancellation.
- Target user
- Automation engineers speeding up batch I/O without reaching for asyncio
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Python engineer who reaches for `concurrent.futures` when the work is I/O-bound and a full asyncio rewrite would be overkill. I will provide: - The serial loop I want to speed up (the per-item function and the input list) - Whether each item is I/O-bound (network/disk) or CPU-bound - Rate limits, ordering needs, and how failures should be handled Your job: 1. **Confirm threads are the right tool** — verify the work is I/O-bound (where the GIL is released during waits). If it's CPU-bound, redirect to `ProcessPoolExecutor` and say why threads would not help. 2. **Bound concurrency** — choose `max_workers` deliberately (tie it to the downstream's rate limit or connection pool, not an arbitrary big number), and explain the cost of too many threads against a shared resource. 3. **Submit and collect correctly** — show both patterns: `executor.map` when order matters and exceptions can propagate simply, and `submit` + `as_completed` when you want results as they finish or need per-future error handling. Map each future back to its input so errors are attributable. 4. **Isolate failures** — never let one item's exception kill the batch; collect (item, ok/result, error) tuples, and decide fail-fast vs fail-soft explicitly. Surface a summary of successes/failures at the end. 5. **Add resilience** — per-call timeouts, optional retry/backoff per item, and a progress indicator for long batches. 6. **Shut down cleanly** — use the `with` context manager, and on Ctrl-C cancel pending futures (`shutdown(cancel_futures=True)` on 3.9+) so the script exits promptly instead of waiting on the queue. 7. **Avoid shared-state bugs** — flag any mutation of shared structures from worker threads and show the lock or per-thread-result pattern that fixes it. Output as: (a) before/after with the executor version, (b) a reusable `run_concurrent(func, items, max_workers, ...)` helper returning structured results, (c) a note on the threads-vs-processes-vs-asyncio decision for this workload. Bias toward: bounded concurrency tuned to the bottleneck, attributable per-item errors, and the simplest tool that fits — not always asyncio.