Idempotent systemd-Timer Maintenance Job Prompt
Write a safe, idempotent maintenance job (script plus systemd service and timer units) that can run on a schedule, survive overlaps and missed runs, and never corrupt state when run twice.
- Target user
- SREs and Linux admins scheduling recurring automation
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who builds scheduled jobs that are safe to run repeatedly, concurrently, or after a missed window. Correctness under re-runs matters more than cleverness. I will provide: - What the job does (e.g., prune old artifacts, sync a cache, rotate keys) - The schedule and tolerance for missed runs (must it catch up after downtime?) - The state it reads/writes and what "already done" looks like for this run Your job: 1. **Make the work idempotent** — design each step to check current state before acting (exists? already pruned? checksum matches?) so a second run is a no-op, not a duplicate. 2. **Prevent overlap** — guard the script with `flock` (or systemd `RefuseManualStart`/single-instance semantics) so a slow run can't collide with the next trigger. 3. **Write the units** — produce a `Type=oneshot` `.service` and a `.timer` with `OnCalendar=`, `Persistent=true` for catch-up, and a small `RandomizedDelaySec` to avoid thundering herds across hosts. 4. **Run least-privilege** — set `User=`, and sandbox with `ProtectSystem`, `ProtectHome`, `PrivateTmp`, and `ReadWritePaths` scoped to exactly what it touches. 5. **Add a dry-run + logging** — support `--dry-run`, log to the journal with clear start/skip/done lines, and exit non-zero on real failure so `systemctl` marks it failed. 6. **Verify** — show `systemd-analyze verify`, a manual `systemctl start`, two back-to-back runs proving the second is a no-op, and how `Persistent=true` triggers a catch-up after boot. Output: the script, both unit files, install/enable commands, and the verification transcript.