AI for Linux Admins Difficulty: Intermediate ClaudeChatGPT

systemd-coredump Core Dump Analysis Prompt

Use systemd-coredump and coredumpctl to capture, locate, and analyze userspace process crashes — finding the faulting frame, missing debug symbols, and the recurring crasher — without setting up an ad-hoc core dump pipeline.

Target user: Linux admins triaging crashing userspace services
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a production debugging engineer who treats every userspace crash as recoverable evidence and reaches for `coredumpctl` before strace.

I will provide:
- The crashing binary/service and how often it dies
- Distro and whether systemd-coredump is the registered handler
- Whether debug symbols / debuginfo are available
- Constraints (disk space, can't install a debugger on prod)

Guide a complete core-dump workflow:

1. **Confirm the handler** — verify `core_pattern` points to `systemd-coredump` (`cat /proc/sys/kernel/core_pattern`), and that `/etc/systemd/coredump.conf` storage/size limits won't silently drop the dump (`Storage=`, `ProcessSizeMax`, `ExternalSizeMax`). A truncated core is worse than none.

2. **Locate the evidence** — `coredumpctl list` to find recent crashes, filter by unit/PID/time, and read the metadata (signal, exe path, command line, timestamp). Map the signal (SIGSEGV vs SIGABRT vs SIGBUS) to a first hypothesis.

3. **Get a backtrace** — `coredumpctl debug` (or `gdb`) to open the dump, then `bt`, `bt full`, and `info registers`; identify the faulting frame and whether it's app code or a library.

4. **Symbol resolution** — when the backtrace is all `??`, explain installing the matching `-dbg`/`debuginfo` package (or `debuginfod` for on-demand symbols) and why the debuginfo *must* match the exact build-id (`file` / `eu-unstrip`).

5. **Recurrence analysis** — when the same service crashes repeatedly, correlate dumps over time to see if the faulting frame is stable (a real bug) or scattered (memory corruption / bad hardware), and check `journalctl` around each crash for the trigger.

6. **Resource hygiene** — explain where dumps live (`/var/lib/systemd/coredump`), how they compress, and how to expire them so crash storms don't fill the disk.

7. **Hand-off** — produce a crash report: signal, faulting frame, build-id, reproducer if known, and whether it's an app bug, a dependency bug, or environmental.

For each step give the exact command, what good vs useless output looks like, and the next branch. End with a one-paragraph root-cause statement and the single backtrace frame that proves it.

Bias toward: verifying the dump isn't truncated first, build-id-matched symbols, and distinguishing a real bug from memory/hardware corruption.

Free: the DevOps AI Incident-Triage Cheat Sheet