Skip to content
CloudOps
All guides
AI for Linux Admins · 7 min read

How to Use Claude to Troubleshoot Linux Servers

A practical, copy-pasteable workflow for using Claude to diagnose production Linux issues — including the prompt structure, what to paste, and what not to.

  • #claude
  • #linux
  • #troubleshooting
  • #sre

Claude is genuinely useful for production Linux troubleshooting — when you use it right. Here’s the workflow that works, after a year of using it on real incidents across Ubuntu, RHEL, and Rocky.

The mental model: Claude is a senior pair, not an oracle

The mistake most engineers make on day one: they paste a 5-line error message and expect a fix. Claude can do better than that — but only if you give it the same context you’d give a senior engineer joining your incident bridge.

A senior engineer would want:

  • What OS and version?
  • What does this server do?
  • What changed recently?
  • What’s the actual symptom?
  • What command output have you already gathered?

Give Claude that, and the quality of analysis changes completely.

The workflow

Step 1: Establish context with a system prompt

Use our Linux Server Troubleshooting Prompt as your system prompt, or paraphrase: “You are a senior Linux sysadmin. Rank root-cause hypotheses by probability. Recommend safe diagnostics first. Label destructive commands as DANGEROUS.”

Step 2: Paste structured context, not noise

Good:

OS: Ubuntu 22.04, kernel 5.15
Role: production MySQL replica, 64GB RAM, 16 cores
Recent changes: kernel upgrade 6 hours ago
Symptom: server load average 40+, MySQL replication lag growing, queries timing out

$ uptime
 14:22:01 up 6:02,  4 users,  load average: 41.23, 38.51, 35.04

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           62Gi        58Gi       1.2Gi       128Mi       3.1Gi       1.8Gi

$ iostat -xz 2 3
[...]

Bad:

my server is slow can you help

Step 3: Let it ask follow-up questions

The good prompts in our library tell Claude to ask for missing data before guessing. When it asks “can you share dmesg | tail -50 and vmstat 1 5?” — that’s a feature, not a flaw. Give it the data.

Step 4: Validate suggested commands before running

Claude will sometimes suggest a command with subtly wrong syntax, a destructive flag, or a path that doesn’t exist on your distro. Read every suggestion before running. Never paste straight into a root shell.

Step 5: Keep the conversation alive

Claude’s long context means you can run a 30-minute diagnostic session in one thread, paste new output as you gather it, and the model retains the full diagnostic context. This is the single biggest workflow win versus older AI tools.

What Claude is good at

  • Reading command output you don’t fully understand (strace, perf, tcpdump summaries).
  • Drafting awk/sed/grep one-liners for log analysis.
  • Explaining why a specific kernel parameter or sysctl is set.
  • Suggesting what to look at next when you’re stuck.
  • Drafting the incident summary after you’ve fixed it.

What Claude is not good at

  • Real-time anything — it can’t see your live metrics.
  • Distinguishing between two plausible root causes when both fit the symptoms (it’ll guess).
  • Telling you what’s normal for your environment. You have to provide that baseline.

A real-world example

A production server’s load average suddenly spiked. Pasting top, iostat -xz 2 3, and dmesg | tail -50 into Claude with our prompt template, it immediately flagged: %iowait is 78%, await on /dev/sda is 320ms, and dmesg shows ‘task X blocked for more than 120 seconds.’ The disk subsystem is saturated, not CPU. Investigate which process is doing heavy I/O: iotop -oP -d1 will show the writer in 1-second intervals.”

That’s exactly the diagnosis we wanted, framed with the evidence — in seconds.

Companion resources

Newsletter

Get weekly AI CloudOps workflows

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.