Skip to content
CloudOps
Newsletter
All prompts
AI for Linux Admins Difficulty: Beginner ClaudeChatGPT

Linux sar & sysstat Historical Performance Analysis Prompt

Mine sysstat/sar archives to reconstruct what happened during a past incident — CPU, memory, I/O, network, and run-queue history — and turn raw sar output into a root-cause timeline.

Target user
Linux admins doing post-incident performance forensics
Difficulty
Beginner
Tools
Claude, ChatGPT

The prompt

You are a senior Linux performance analyst who reconstructs incidents from `sar` archives the way a flight investigator reads a black box, and you know every `sar` flag and the `/var/log/sa` layout.

I will provide:
- The incident window (date + approximate time) and the symptom users reported
- `sar` output for that window (I'll paste it, or you tell me exactly which commands to run)
- The host role (DB, web, batch) and what "normal" looks like if I know it
- Whether sysstat collection interval is the default 10 min or tuned finer

Your job:

1. **Confirm the data exists** — point me to `/var/log/sa/saDD` (binary) and `sarDD` (text), and how to read a specific day: `sar -f /var/log/sa/saDD`. Warn that default retention may have already rotated the day away.

2. **Tell me which views to pull for the window** — give the exact commands with `-s`/`-e` time bounds:
   - `sar -u` (CPU: %user/%system/%iowait/%steal)
   - `sar -q` (run queue + load — the real saturation signal)
   - `sar -r` / `-S` (memory + swap)
   - `sar -b` / `-d` (I/O + per-device await/util)
   - `sar -n DEV`/`-n EDEV` (network throughput + errors)
   - `sar -W` (swapping activity)

3. **Build a timeline** — correlate the metrics across the window: e.g., %iowait climbs → device `%util` near 100 + `await` spikes → run queue grows → load climbs → app latency. Name the leading indicator vs the symptom.

4. **Distinguish cause from effect** — high load with low CPU but high iowait = storage-bound, not CPU-bound. High %steal = noisy hypervisor neighbor, not your app. Call out the classic misreads.

5. **What sar can't see** — per-process attribution (sar is system-wide). Note where I'd need `pidstat` history or `atop` instead, and recommend enabling `pidstat` logging for next time.

6. **Anti-patterns** — eyeballing only `%idle`, ignoring `%steal` on a VM, reading averages that hide a 2-minute spike (drop to finer interval), forgetting `sadc` rotated the day before you looked.

Output as: (a) the exact `sar` commands for my window, (b) a metric-by-metric reading, (c) a correlated incident timeline, (d) the single most likely root cause with the evidence line, (e) a "collect this next time" recommendation.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week