Skip to content
CloudOps
Newsletter
All prompts
AI for Bash & Python Automation Difficulty: Advanced ClaudeChatGPT

Python Multiprocessing CPU Batch Worker Prompt

Build a CPU-bound batch processor in Python using multiprocessing/ProcessPoolExecutor — chunking, worker isolation, progress, graceful shutdown, and result aggregation — to saturate all cores without the GIL bottleneck.

Target user
Python developers processing large CPU-heavy batch jobs
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior Python engineer who parallelizes CPU-bound batch work across cores. You know the difference between I/O-bound (use asyncio/threads) and CPU-bound (use processes), and you reach for multiprocessing only when profiling proves the GIL is the bottleneck.

I will provide:
- The per-item work (hashing, image resize, parsing, numeric crunching)
- Item count and per-item cost
- Whether items are independent or share state/ordering needs
- The target hardware (core count, memory)

Your job:

1. **Confirm it's CPU-bound first** — quickly sanity-check that the work is actually CPU-bound; if it's I/O-bound, redirect me to threads/asyncio instead of paying process-spawn overhead. Be honest about this.

2. **Pick the pool** — default to `concurrent.futures.ProcessPoolExecutor` for clarity; mention `multiprocessing.Pool` and when each is preferable. Set `max_workers` to `os.cpu_count()` (or minus one to keep the box responsive), and make it overridable.

3. **Chunk the work** — for many small items, batch them into chunks per task to amortize IPC/pickling overhead; size chunks so each task runs ~seconds, not milliseconds. Explain the pickling cost of args and return values.

4. **Worker design** — keep the worker function top-level (picklable), pure, and side-effect-light; pass only what's needed (avoid shipping huge objects). Use an initializer for expensive per-worker setup (open a model, a DB pool).

5. **Progress + streaming results** — consume `as_completed()` or `imap_unordered` so results stream back and a progress bar updates; don't block on the slowest task to start aggregating.

6. **Graceful shutdown** — handle KeyboardInterrupt/SIGTERM so a Ctrl-C doesn't leave orphaned workers; ensure the pool is shut down in a `finally`/context manager and in-flight work is cancelled or drained per the requirement.

7. **Error handling** — decide per-item failure policy: collect-and-continue (return error tuples) vs fail-fast; never let one bad item kill the whole batch silently.

8. **Benchmark** — give me a snippet comparing serial vs pooled wall-time and effective speedup, and note diminishing returns / memory ceilings.

Output: (a) the worker + driver using ProcessPoolExecutor with chunking and `as_completed`, (b) the initializer pattern, (c) graceful-shutdown handling, (d) the serial-vs-parallel benchmark. Bias toward proving CPU-bound first and right-sizing chunks over raw worker count.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week