Untangling systemd Boot Time with systemd-analyze and AI
Slow boots and tangled service dependencies hide in plain sight. Here's how to read systemd-analyze blame and critical-chain with an AI decoding the graph.
- #linux
- #systemd
- #boot
- #performance
- #dependencies
A server that takes four minutes to come back after a reboot is a server you’re scared to reboot, which means it never gets patched, which means it’s the one that gets owned. Slow boots and tangled service dependencies are quietly corrosive, and systemd gives you excellent tools to diagnose them — systemd-analyze and its subcommands — that almost nobody fully reads. The output is a dependency graph and a timing breakdown that reward careful interpretation, and interpretation is exactly what an AI copilot does well as a fast junior engineer: it reads the whole chain, names the bottleneck, and explains the ordering, while I make the calls about what to actually change. Here’s the workflow.
Getting the headline number
Start with the summary so you know whether you even have a problem.
systemd-analyze
systemd-analyze time
This prints how long firmware, loader, kernel, and userspace each took. If userspace dominates, your problem is services, not hardware. That distinction alone saves you from chasing the wrong layer.
Finding the slow units
blame sorts every unit by how long it took to initialize.
systemd-analyze blame | head -20
The trap here is that blame ignores ordering — a unit that took 30 seconds may have spent 28 of them waiting for something else, not doing work. This is where I paste the output into an AI with context:
“This is systemd-analyze blame from a server that boots in 3 minutes. Which of these are likely doing real work versus waiting on a dependency? Don’t suggest disabling anything yet — just help me read it.”
The model will flag that a long NetworkManager-wait-online.service is almost always waiting, not working, and that’s a classic boot-time killer. That nudge points you at critical-chain next.
Reading the critical chain
critical-chain shows the actual dependency path that determined when boot finished — the bottleneck sequence.
systemd-analyze critical-chain
systemd-analyze critical-chain my-app.service
The @ timestamps show when each unit started; the + durations show how long it ran. The output is a tree, and trees are exactly what AI is good at summarizing. Hand it the chain and ask which single unit, if it started earlier, would shave the most time. It traces the path faster than I can and explains why network-online.target gating your app is the long pole.
Pro Tip: The most common boot-time win on servers is breaking a hard dependency on network-online.target. Many services don’t actually need the network fully up before they start — ask the AI to help you check whether yours truly does before you change its After=/Wants=.
Visualizing the dependency graph
For genuinely tangled setups, render the graph and have the AI describe it.
systemd-analyze dot 'my-app.*' | dot -Tsvg > deps.svg
systemd-analyze critical-chain | tee chain.txt
The SVG is for humans; the dot text output and the chain file are what you feed the model. Ask it to summarize the dependency relationships in plain English and flag any cycle or redundant ordering. I keep these graph-reading prompts in my prompt workspace so the team interprets boot graphs the same way.
Verifying a change before you trust it
When the AI suggests, say, that a unit doesn’t need After=network-online.target, you don’t just delete the line and reboot prod. Test the reasoning first. Use a drop-in override so the original unit file stays pristine.
sudo systemctl edit my-app.service
# add under [Unit]: After= and Wants= overrides
sudo systemctl daemon-reload
Then reboot a staging box, not production, and re-run systemd-analyze critical-chain to confirm the chain actually got shorter. The AI proposes the hypothesis; the reboot proves it. That separation — AI suggests, human verifies on non-prod — is the whole discipline. If you want incident-grade tracking of boot regressions across reboots, the monitoring-alerts dashboard can watch boot duration as a metric.
When a unit hangs forever
Sometimes boot doesn’t just go slow, a unit hangs and stalls everything behind it. The journal scoped to boot tells the story.
journalctl -b -u my-app.service
systemd-analyze critical-chain | grep -i timeout
Paste a hung unit’s journal into the AI and ask what it’s blocked on. It’s fluent in systemd’s “A start job is running for…” messages and will tell you whether it’s a TimeoutStartSec issue, a missing mount, or a dependency that never came up. You confirm against the box and decide the fix.
Spotting services that don’t need to run at all
Half the boot-time battle isn’t ordering — it’s services that have no business starting on this machine. A web server doesn’t need ModemManager; a headless box doesn’t need printer discovery. Each one adds startup time and attack surface.
systemctl list-unit-files --state=enabled | sort
systemctl list-units --type=service --state=running
Paste the enabled-units list into the AI with a description of the server’s actual role — “this is a stateless API box behind a load balancer” — and ask which enabled services are likely unnecessary for that role. It’s good at flagging the usual suspects (ModemManager, cups, avahi-daemon, bluetooth) and explaining what each does so you can make an informed call. You never blindly disable on its say-so; you confirm each one isn’t a hidden dependency, then mask it deliberately.
Pro Tip: Use systemctl mask rather than disable for services you’re certain you never want — masking points the unit at /dev/null so nothing can pull it back in as a dependency. But mask conservatively, and only after the AI-suggested candidates survive your own review, because a masked unit that something quietly needed will fail in a confusing way.
Comparing boots to catch regressions
Boot time creeping up after an update is a real signal, and the way to catch it is to compare a known-good boot against a current one.
systemd-analyze critical-chain > chain-$(date +%F).txt
diff chain-good.txt chain-$(date +%F).txt
Hand both chain files to the AI and ask what changed — it’ll tell you a new unit inserted itself into the critical path, or an existing one started taking longer. That before/after framing turns a vague “boots feel slower” into a specific unit you can investigate.
Keeping it safe
The hard rules don’t bend just because this is “only” boot analysis. The AI reads exported blame, critical-chain, and journal text — it never runs systemctl against your servers, and it never gets credentials. Boot ordering changes can make a server fail to come up cleanly, which is a far worse outage than a slow boot, so every dependency edit is verified on a staging box by a human before it touches prod. Treat the model as a fast junior engineer who’s great at reading graphs and terrible at being trusted with the reboot button.
Conclusion
systemd-analyze already tells you exactly why your boots are slow; the gap was always interpretation, and that’s the gap an AI copilot closes. Get the headline split, find the slow units with blame, trace the real bottleneck with critical-chain, then verify any dependency change on staging before prod. Human owns the edits and the reboots; AI owns the graph reading. More guides live in the Linux admin category, and the Linux admin prompt pack bundles the boot-analysis prompts I reach for first.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.