AI for Bash & Python Automation Difficulty: Advanced ClaudeChatGPT

Bash Signal Handling and Graceful Daemon Shutdown Prompt

Make a long-running bash loop or worker handle SIGTERM/SIGINT gracefully — finish the in-flight unit of work, release locks, flush state, and exit with the right code — so container stops and systemd restarts never corrupt data.

Target user: Engineers running bash workers under systemd or containers
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior engineer who has debugged data corruption caused by a `docker stop` or `systemctl restart` killing a bash worker mid-write. You know that SIGTERM is a request to finish up, not a license to drop everything.

I will provide:
- The long-running script (a `while true` poll loop, a queue consumer, a batch worker)
- What a "unit of work" is and what mid-unit interruption would corrupt
- How it's supervised (systemd, Docker, k8s, bare nohup)

Your job:

1. **Trap the right signals** — `trap on_term TERM INT` to request shutdown; explain that SIGKILL (9) cannot be caught, so the goal is to exit cleanly BEFORE the supervisor's grace period (`TimeoutStopSec` / k8s `terminationGracePeriodSeconds`) elapses and it escalates to KILL.

2. **Cooperative shutdown flag** — the trap sets a `SHUTDOWN=1` flag rather than exiting immediately; the main loop checks the flag at safe points (between units of work) and breaks. This guarantees no unit is abandoned half-done.

3. **Beware blocking calls** — a `sleep` or blocking `read` won't see the signal until it returns. Show the pattern of `sleep` as a backgrounded `wait`-able process (or short sleep + flag re-check) so the signal interrupts the wait promptly instead of after the full interval.

4. **Finish-or-abort policy** — decide per workload: finish the current unit then exit (preferred for correctness), or checkpoint progress and requeue. Make in-progress work safe to resume.

5. **Cleanup on exit** — combine with an `EXIT` trap to release `flock`, remove pidfiles/temp files, and flush logs, preserving the exit code. Note that EXIT and the signal traps must cooperate, not double-clean.

6. **Right exit codes** — exit 0 on a clean requested shutdown so systemd doesn't treat a normal stop as a crash-loop; reserve non-zero for real failures. Mention `SuccessExitStatus` if needed.

7. **systemd/container hints** — recommend `KillSignal=SIGTERM`, a sane `TimeoutStopSec`, and for containers ensure the script is PID 1 or uses an init (`tini`) so signals actually reach it (a shell that `exec`s the worker, or `exec` the worker directly).

8. **Verify** — give me a test that starts the worker, sends SIGTERM mid-unit, and asserts the in-flight unit completed and locks were released.

Output: (a) the worker loop with the trap + shutdown flag + interruptible sleep, (b) the EXIT cleanup that cooperates with it, (c) systemd/container config snippet, (d) the SIGTERM mid-unit test. Bias toward finishing the current unit cleanly and exiting before the grace period over abrupt exit.

Free: the DevOps AI Incident-Triage Cheat Sheet