You are a senior Linux sysadmin who has debugged hundreds of systemd unit failures across Ubuntu, RHEL, and Debian. You can read dependency graphs, decode exit codes, and spot the "drop-in override changed everything" trap. I will provide: - The failing unit name and what it's supposed to do - `systemctl status <unit>` output - `journalctl -u <unit> --no-pager -n 100` output - `systemctl cat <unit>` (full effective unit file including drop-ins) - Whether the failure is at boot or runtime; first occurrence or repeat - Distro + systemd version Your job: 1. **Read the status carefully**: - State (`active`, `inactive`, `failed`, `activating`, `deactivating`) + sub-state - Result reason (`exit-code`, `signal`, `timeout`, `protocol`, `oom-kill`, `dependency`) - Exit code (0 = clean, 1-255 = app error, ≥128 = signal-killed, 247 = SIGSEGV via systemd) - Active time / inactive time → flapping vs first failure 2. **Walk the dependency chain** from `systemctl list-dependencies <unit>` and `systemctl list-dependencies --reverse <unit>`: - Was a required `After=` / `Requires=` unit unavailable? - Was a network/mount target not reached? - Is there an ordering cycle? (`systemd-analyze verify`) 3. **Decode the journal output**: - Exit code mapping (`Status=...`) - Common signal kills: SIGKILL (9) = OOM or `kill -9`; SIGTERM (15) = stopped/restarted; SIGSEGV (11) = app crash - `(code=killed, signal=KILL)` with the OOM-killer banner upstream means cgroup OOM - `Watchdog timeout` = the service didn't ping `sd_notify` in time 4. **Check effective config including drop-ins**: - `systemctl cat` shows ALL fragments (base + `/etc/systemd/system/*.d/*.conf` overrides) - Override-file misnamings (e.g., `override.cnf` instead of `.conf`) are silently ignored - `Environment=` order matters; later wins 5. **Common root causes to check**: - `ExecStart=` binary path wrong, or `User=` doesn't exist - `WorkingDirectory=` doesn't exist - `ReadOnlyPaths=` blocks a required write path - `ProtectSystem=strict` + app writes to `/etc` → permission denied with cryptic exit code - `RestartSec=` too low + `StartLimitBurst=` exceeded → stuck in "start-limit-hit" - Missing `After=network-online.target` for net-dependent service that crashes early - Hardware/mount dependency: `.mount` unit failure cascading 6. **For boot-time failures**: `systemd-analyze blame`, `systemd-analyze critical-chain`, and check if `emergency.target` or `rescue.target` is reachable. 7. **Suggest the recovery path**: - Reset start-limit state (`systemctl reset-failed <unit>`) - Reload after unit edits (`systemctl daemon-reload`) - Override safely with `systemctl edit <unit>` (creates a drop-in, never edit packaged unit files) Mark anything DESTRUCTIVE clearly (mask, force-stop while dependents run, daemon-reload during active deployment). --- Unit name: [e.g., myapp.service / mnt-data.mount / postgresql.service] Failure context: [boot / runtime / repeat / first-time] Distro + systemd version: [e.g., Ubuntu 22.04, systemd 249] `systemctl status <unit>` (with -l): ``` [PASTE] ``` `journalctl -u <unit> -n 100 --no-pager`: ``` [PASTE] ``` `systemctl cat <unit>` (effective config): ``` [PASTE] ``` Any related units that also failed: ``` [PASTE] ```

Why this prompt works

systemd failures hide the actual error behind multiple layers: the unit state, the journal output, the dependency graph, and the effective drop-in configuration. The literal exit code 217 means “user does not exist” but systemctl status doesn’t translate it — you just see a number. This prompt forces the model to decode each layer.

How to use it

Always paste systemctl cat <unit> — not the original unit file. Drop-ins in /etc/systemd/system/<unit>.d/ can flip critical behavior and the base file alone misleads.
Paste at least 100 lines of journalctl. The first error is usually 20+ lines before the visible failure.
Mention the symptom timing: at boot? after a deploy? randomly every 4 hours? Time pattern is diagnostic.
If the unit has dependents, include their status too. Sometimes the “failing” unit is just the visible one in a chain.

Useful commands

# Full picture of one unit
systemctl status <unit> -l
systemctl cat <unit>
systemctl show <unit> | less
journalctl -u <unit> -n 100 --no-pager
journalctl -u <unit> --since "1 hour ago" --no-pager
journalctl -u <unit> -p err --no-pager     # errors only

# Dependency analysis
systemctl list-dependencies <unit>
systemctl list-dependencies --reverse <unit>
systemd-analyze verify <unit>
systemd-analyze dot <unit> | dot -Tsvg > deps.svg   # graphviz install required

# Boot analysis
systemd-analyze
systemd-analyze blame | head -30
systemd-analyze critical-chain
systemd-analyze plot > boot.svg

# Edit safely
sudo systemctl edit <unit>          # creates override.conf drop-in
sudo systemctl daemon-reload        # MANDATORY after edits
sudo systemctl restart <unit>

# Override a single setting (interactive)
sudo systemctl edit --full <unit>   # edit the full unit (drop-in copy)

# Reset state
sudo systemctl reset-failed <unit>
sudo systemctl reset-failed         # everything

# Find ALL drop-ins for a unit
ls -la /etc/systemd/system/<unit>.d/
ls -la /run/systemd/system/<unit>.d/
ls -la /usr/lib/systemd/system/<unit>.d/

# Verify changes without restart
systemd-analyze verify /etc/systemd/system/<unit>.service

Common exit codes to recognize

Exit code	systemd meaning
0	Clean exit
1	Generic failure
200–242	systemd-reserved (User/Group setup failures)
200 (`EXIT_CHDIR`)	`WorkingDirectory=` doesn’t exist
203 (`EXIT_EXEC`)	`ExecStart=` binary not found / not executable
207 (`EXIT_STDIN`)	stdin redirect failed
208 (`EXIT_STDOUT`)	stdout redirect failed
217 (`EXIT_USER`)	`User=` doesn’t exist
218 (`EXIT_GROUP`)	`Group=` doesn’t exist
219 (`EXIT_CHROOT`)	`RootDirectory=` failure
226 (`EXIT_NAMESPACE`)	namespace setup failed
232 (`EXIT_ADDRESS_FAMILIES`)	`RestrictAddressFamilies=` blocked
247 (`EXIT_MEMORY`)	memory setup failed

Signal-killed codes (code=killed, signal=<NAME>) are separate:

SIGTERM (15) → systemd asked to stop
SIGKILL (9) → cgroup OOM or kill -9 (check dmesg for OOM banner)
SIGSEGV (11) → app bug
SIGABRT (6) → assert() failure

Common findings this catches

Exit code 217 → User=appuser set but the user doesn’t exist on this host (forgot to add in your deploy).
Status=killed, signal=KILL with no OOM banner → external kill -9; check who/what (audit logs).
Service flapping with “start-limit-hit” → Restart=always + crash; counter exceeded StartLimitBurst. Fix the app, then reset-failed.
(code=exited, status=203/EXEC) → ExecStart= path doesn’t exist or isn’t executable. Common after a package downgrade.
Unit “active” but app not running → forked into background; Type=forking mismatch with Type=simple.
Watchdog timeouts → WatchdogSec= set but the app doesn’t call sd_notify(WATCHDOG=1).
ProtectSystem=strict + permission denied → app needs to write somewhere outside its allowed paths; add ReadWritePaths=/var/lib/myapp.
Drop-in override file ignored → wrong filename extension (override.conf is correct; override.cnf is silently ignored).

Safe override pattern

sudo systemctl edit myapp.service

Add only the lines you want to override or extend (note the [Service] header):

[Service]
# Empty ExecStart= clears the inherited value before adding the new one
ExecStart=
ExecStart=/usr/local/bin/myapp --new-flag

Environment=DEBUG=true
TimeoutStartSec=120

Then:

sudo systemctl daemon-reload
sudo systemctl restart myapp.service
sudo systemctl status myapp.service

When to escalate

Boot stuck in emergency.target with no obvious failed unit — engage console access, do not reboot blindly.
Failed *.mount unit on a critical filesystem — coordinate with storage; do not edit /etc/fstab over a hung session.
A unit failure that correlates with a kernel taint in dmesg — likely driver/hardware; pull in platform team.

Reading prompts? Get all 500 in one free PDF

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response

Instant PDF download — yours free, forever

Plus one practical AI-workflow email a week (no spam)

systemd Unit Failure Debugging Prompt

Why this prompt works

How to use it

Useful commands

Common exit codes to recognize

Common findings this catches

Safe override pattern

When to escalate

Related prompts

Linux Server Troubleshooting Prompt

Linux Boot Failure & Rescue Prompt

Linux OOM Kill & Memory Pressure Investigation Prompt

Sudoers & Systemd Services Review Prompt

Reading prompts? Get all 500 in one free PDF