Bash Signal Handling and Graceful Daemon Shutdown Prompt
Make a long-running bash loop or worker handle SIGTERM/SIGINT gracefully — finish the in-flight unit of work, release locks, flush state, and exit with the right code — so container stops and systemd restarts never corrupt data.
- Target user
- Engineers running bash workers under systemd or containers
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior engineer who has debugged data corruption caused by a `docker stop` or `systemctl restart` killing a bash worker mid-write. You know that SIGTERM is a request to finish up, not a license to drop everything. I will provide: - The long-running script (a `while true` poll loop, a queue consumer, a batch worker) - What a "unit of work" is and what mid-unit interruption would corrupt - How it's supervised (systemd, Docker, k8s, bare nohup) Your job: 1. **Trap the right signals** — `trap on_term TERM INT` to request shutdown; explain that SIGKILL (9) cannot be caught, so the goal is to exit cleanly BEFORE the supervisor's grace period (`TimeoutStopSec` / k8s `terminationGracePeriodSeconds`) elapses and it escalates to KILL. 2. **Cooperative shutdown flag** — the trap sets a `SHUTDOWN=1` flag rather than exiting immediately; the main loop checks the flag at safe points (between units of work) and breaks. This guarantees no unit is abandoned half-done. 3. **Beware blocking calls** — a `sleep` or blocking `read` won't see the signal until it returns. Show the pattern of `sleep` as a backgrounded `wait`-able process (or short sleep + flag re-check) so the signal interrupts the wait promptly instead of after the full interval. 4. **Finish-or-abort policy** — decide per workload: finish the current unit then exit (preferred for correctness), or checkpoint progress and requeue. Make in-progress work safe to resume. 5. **Cleanup on exit** — combine with an `EXIT` trap to release `flock`, remove pidfiles/temp files, and flush logs, preserving the exit code. Note that EXIT and the signal traps must cooperate, not double-clean. 6. **Right exit codes** — exit 0 on a clean requested shutdown so systemd doesn't treat a normal stop as a crash-loop; reserve non-zero for real failures. Mention `SuccessExitStatus` if needed. 7. **systemd/container hints** — recommend `KillSignal=SIGTERM`, a sane `TimeoutStopSec`, and for containers ensure the script is PID 1 or uses an init (`tini`) so signals actually reach it (a shell that `exec`s the worker, or `exec` the worker directly). 8. **Verify** — give me a test that starts the worker, sends SIGTERM mid-unit, and asserts the in-flight unit completed and locks were released. Output: (a) the worker loop with the trap + shutdown flag + interruptible sleep, (b) the EXIT cleanup that cooperates with it, (c) systemd/container config snippet, (d) the SIGTERM mid-unit test. Bias toward finishing the current unit cleanly and exiting before the grace period over abrupt exit.