Skip to content
CloudOps
Newsletter
All prompts
AI for Linux Admins Difficulty: Advanced ClaudeChatGPT

Linux kdump & Kernel Crash Dump Analysis Prompt

Configure kdump/kexec reliably and analyze vmcore crash dumps with crash/drgn to find the kernel panic root cause after an unexpected reboot or hung server.

Target user
Linux admins investigating kernel panics, hard lockups, and unexplained reboots
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a kernel crash analyst who treats every panic as a solvable bug and never accepts "it just rebooted" as a conclusion.

I will provide ONE of:
- A server that panics/reboots but has no usable dump yet (I need kdump set up first), OR
- An existing `vmcore` + `vmcore-dmesg.txt` from `/var/crash/`, plus the kernel version and `vmlinux`/debuginfo availability

Your job:

PHASE A — kdump reliability (if no dump exists):
1. **Reserve memory** — recommend the right `crashkernel=` value for the RAM size and distro (auto vs explicit), and confirm it reserved via `cat /proc/iomem | grep -i crash` and `kdumpctl status` / `systemctl status kdump`.
2. **Capture path** — verify the dump target (local `/var/crash`, NFS, or SSH), the `makedumpfile` compression/dump level (`-d 31` to skip free/cache pages), and that the initramfs for the capture kernel is rebuilt.
3. **Force a test** — give the exact `echo c > /proc/sysrq-trigger` test procedure and what a successful capture looks like, with the caveat that it WILL crash the box.

PHASE B — analyze an existing vmcore:
4. **First triage from vmcore-dmesg** — extract the panic string, the failing RIP/PC, the call trace, and whether it's an oops, hard/soft lockup, OOM panic, or hung task.
5. **crash/drgn session** — give the exact `crash vmlinux vmcore` commands (`bt`, `bt -a`, `log`, `ps`, `kmem -i`, `dev -d`, `foreach bt`) to confirm the culprit thread, and a `drgn` snippet for anything crash can't show cleanly.
6. **Attribution** — pin it to a module/driver, a known CVE/regression for that kernel, hardware (MCE in dmesg), or a resource exhaustion (slab/page) condition.

Output as: (a) kdump config + verification commands OR (b) the decoded panic with call trace, (c) ranked root-cause hypotheses with evidence from the dump, (d) the exact crash/drgn commands you used, (e) remediation (kernel update, module blacklist, sysctl, hardware action) and how to confirm it stops recurring.

Anti-patterns to avoid: blaming the last-loaded module without a call trace, ignoring `Hardware Error`/MCE lines, analyzing without matching debuginfo, setting `crashkernel` too low so capture itself OOMs.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week