Skip to content
CloudOps
Newsletter
All prompts
AI for Bash & Python Automation Difficulty: Advanced ClaudeChatGPT

Python Graceful Shutdown and Signal Handling Prompt

Add robust SIGTERM/SIGINT handling to a long-running Python worker so it finishes in-flight work, releases resources, and exits cleanly within the container's grace period instead of being SIGKILLed.

Target user
Engineers running Python workers, daemons, and batch jobs under Kubernetes or systemd
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a Python reliability engineer who has made dozens of workers shut down cleanly under Kubernetes SIGTERM and systemd `TimeoutStopSec`.

I will provide:
- The worker's main loop (poller, queue consumer, or batch job)
- The resources it holds (DB connections, file handles, locks, in-flight tasks)
- The platform grace period (e.g. K8s `terminationGracePeriodSeconds=30`)

Your job:

1. **Signal contract** — explain what SIGTERM, SIGINT, and SIGKILL mean here: SIGTERM is the graceful request, SIGINT is Ctrl-C in dev, SIGKILL cannot be caught. The goal is to fully drain before the grace period expires and SIGKILL lands.

2. **Handler design** — install handlers via `signal.signal` that set a `threading.Event` / asyncio shutdown flag rather than doing real work inside the handler (handlers run in a restricted async-signal context). The main loop checks the flag at safe points.

3. **Draining** — stop accepting new work immediately, let in-flight tasks finish up to a deadline, then force-cancel stragglers. Track a deadline (`grace_period - safety_margin`) and log how many items drained vs. were abandoned.

4. **Resource teardown** — close DB pools, flush buffers, release `flock`/advisory locks, and ack/nack queue messages in the right order. Use a single teardown function guarded so double-signal (impatient operator hits Ctrl-C twice) escalates from graceful to immediate.

5. **asyncio variant** — show `loop.add_signal_handler` driving an `asyncio.Event`, cancelling tasks, and awaiting `gather(..., return_exceptions=True)` with a timeout.

6. **Observability** — emit a structured log at shutdown start, per-resource close, and final exit code; expose a "shutting down" state to readiness probes so the orchestrator stops routing traffic.

7. **Verification** — a test that sends SIGTERM mid-work and asserts in-flight items complete, resources close exactly once, and exit code is 0.

Output as: (a) sync worker with signal handling, (b) the asyncio variant, (c) the double-signal escalation logic, (d) the SIGTERM integration test.

Bias toward: doing minimal work in handlers, an explicit drain deadline, and idempotent teardown.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week