Python concurrent.futures ThreadPool vs ProcessPool Selector Prompt
Decide between ThreadPoolExecutor and ProcessPoolExecutor and wire up correct exception handling and chunking.
- Target user
- Python automation engineers parallelizing batch workloads
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior Python engineer reviewing a parallelization design. I will give you a workload and you must pick the correct concurrent.futures executor and build a safe runner. Follow these steps exactly:
1. Classify the workload as I/O-bound or CPU-bound based on [WORKLOAD DESCRIPTION] (e.g. HTTP fetches, disk reads, shelling out vs. hashing, parsing, numeric crunching). State the classification and your reasoning in one sentence.
2. Pick the executor: ThreadPoolExecutor for I/O-bound work (GIL is released during I/O waits) or ProcessPoolExecutor for CPU-bound work (sidesteps the GIL via separate processes). Note that ProcessPoolExecutor requires picklable callables and arguments, and that the target function must be importable at module top level.
3. Recommend a worker count: for threads start near [TARGET CONCURRENCY] (default min(32, os.cpu_count() + 4)); for processes default to os.cpu_count(). Explain the trade-off for this specific workload.
4. Chunk the input: when items number in [ITEM COUNT] and each task is cheap, batch them into chunks of [CHUNK SIZE] so per-task scheduling overhead does not dominate; map each chunk to one future rather than one item per future.
5. Submit work with executor.submit(...) collecting a dict of {future: item}, then iterate with concurrent.futures.as_completed(futures, timeout=[OVERALL TIMEOUT]) so results stream back as they finish rather than blocking on the slowest task.
6. Handle exceptions explicitly: inside the as_completed loop call future.result() in a try/except so an exception raised inside a worker is re-raised in the parent; log the failing item, increment a failure counter, and continue rather than aborting the whole run.
7. Use the executor as a context manager (with ... as executor:) so the pool is shut down and joined deterministically even on error.
Output format: return (a) a one-line classification verdict, (b) a single runnable Python module using concurrent.futures with the chosen executor, as_completed, per-future try/except, and chunking, and (c) a short bullet list of the failure modes you guarded against.
Idempotency/safety guardrail: the generated runner must be safe to re-run on a partial result set — each task must be independent and side-effect-free or write to a unique, deterministic output key per item, so a retry of failed items never double-processes successful ones.
Why this prompt works
The single most common parallelization mistake in Python automation is reaching for the wrong executor. Threads and processes look interchangeable in the concurrent.futures API — both expose submit, map, and as_completed — but the GIL makes them behave very differently. Threads only buy you concurrency when the work spends most of its time waiting on I/O, because the GIL is released during blocking syscalls. CPU-bound work pinned behind the GIL will run no faster (and often slower, due to contention) on a ThreadPoolExecutor; it needs the true parallelism of separate OS processes. This prompt forces the classification step first, so the model commits to a justification before generating code, which is exactly how a senior engineer would reason about it.
The second failure mode is silent exceptions. A future that raises inside a worker does not crash the program — the exception is stored and only surfaces when you call future.result(). Engineers who iterate over futures without a try/except end up with runs that appear successful but quietly drop a fraction of their items. By mandating as_completed with explicit per-future error handling, the prompt produces a runner that streams results as they finish and accounts for every failure, which is essential for any batch job you intend to retry.
Finally, the chunking and idempotency guardrails address scale and correctness together. Submitting one future per item across hundreds of thousands of cheap tasks drowns the scheduler in overhead, so the prompt asks for batching. And because parallel runners are almost always re-run after partial failures, the idempotency requirement ensures that retrying the failed subset never corrupts or duplicates work that already succeeded — turning a fragile script into a dependable piece of automation.
Related prompts
-
Python asyncio Semaphore Bounded-Concurrency Review Prompt
Review an asyncio script that fans out work to find unbounded concurrency, then redesign it with a semaphore-bounded task pool, proper cancellation, backpressure, and clean shutdown so it can't overwhelm downstreams.
-
Python Multiprocessing CPU Batch Worker Prompt
Build a CPU-bound batch processor in Python using multiprocessing/ProcessPoolExecutor — chunking, worker isolation, progress, graceful shutdown, and result aggregation — to saturate all cores without the GIL bottleneck.
-
Python Safe Subprocess Wrapper Prompt
Build a hardened Python wrapper around subprocess that runs external commands safely — no shell=True, list args, timeouts, captured output, non-zero handling, and streaming logs — replacing fragile os.system and shell-string calls.