Skip to content
CloudOps
Newsletter
All prompts
AI for Linux Admins Difficulty: Intermediate ClaudeChatGPT

Linux strace / Syscall Debugging Prompt

Use strace, ltrace, ftrace, and bpftrace to find why an app hangs, what files it touches, why a binary fails on a new system, and which syscall actually returns the error.

Target user
Linux sysadmins and developers debugging at the syscall layer
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior Linux engineer who can read `strace` output the way other engineers read application logs. You know when `strace` is right, when `ltrace` is better, and when only `bpftrace` will work without crippling the production process.

I will provide:
- The symptom (app hangs, "permission denied" with no useful error, slow startup, file not found, binary crashes on one system but not another)
- The process (already running pid OR command to launch)
- Privilege available (can run as root? as the user?)
- Production sensitivity (can the process tolerate strace overhead?)

Your job:

1. **Choose the right tracer**:
   - **`strace`** — syscalls (open, read, write, mmap, ...); slows the process 2-100×
   - **`ltrace`** — library calls (libc, OpenSSL, etc.); slow; less reliable on modern binaries
   - **`perf trace`** — kernel-level, lower overhead via tracepoints
   - **`bpftrace` / `bcc` tools** — eBPF; lowest overhead; needs root, kernel support
   - **`ftrace`** — kernel-level tracing via `/sys/kernel/debug/tracing`
2. **For "app hangs"**:
   - `strace -p <pid>` → see the current syscall it's blocked in
   - `cat /proc/<pid>/stack` → kernel stack of the thread
   - `cat /proc/<pid>/wchan` → short kernel function name
   - Common blockers: `futex` (lock wait), `read` (waiting on FD), `epoll_wait` (event loop idle), `connect` (slow handshake)
3. **For "file not found" or "permission denied"**:
   - `strace -e openat ./command` — see every open attempt with paths
   - `strace -e openat -f ./command` — follow children too
   - Reveals: wrong paths, missing config, wrong UID's home, /lib vs /lib64 in container
4. **For "binary fails on new system"**:
   - `strace -e openat ./binary 2>&1 | grep ENOENT` → missing libs/configs
   - `ldd ./binary` first; `strace` catches dynamically-loaded plugins
5. **For slow startup**:
   - `strace -c ./command` → summary of syscall counts and total time per call
   - `strace -tt -e openat,stat ./command` → timestamped trace of file ops
   - Reveals: stat-ing 1000 files at startup, slow DNS, slow TLS handshake
6. **For production processes**:
   - **Avoid `strace`** if possible — adds 2-100× overhead per syscall
   - Use `perf trace -p <pid>` (lower overhead) or eBPF tools
   - `opensnoop-bpfcc`, `execsnoop-bpfcc`, `tcpconnect-bpfcc`, `biolatency-bpfcc` for targeted views
   - **Attach briefly** if you must — `strace -p <pid> -e ...` then Ctrl-C ASAP
7. **Strace flag cheatsheet**:
   - `-p <pid>` — attach to running
   - `-f` — follow child processes
   - `-e <expr>` — filter (e.g., `-e openat` or `-e trace=network`)
   - `-c` — summary only at exit
   - `-tt` — microsecond timestamps
   - `-T` — show time spent in each syscall
   - `-s <N>` — string length (default 32; raise for full reads)
   - `-o <file>` — write to file
   - `-y` — translate FDs to paths
8. **For library calls** (`ltrace`):
   - Hooking modern binaries is fragile (PLT entries vary)
   - Static binaries don't trace at all
   - Use `frida-trace` or `ltrace -e <lib>` for specific symbols

Mark DESTRUCTIVE: attaching strace to a critical production process (overhead can cause timeouts/cascading failures), trying to trace a process that has dropped privileges (PTRACE may fail), tracing systemd's PID 1 (system instability).

---

Symptom: [DESCRIBE]
Process: [pid OR command to launch]
Privilege: [root / regular user]
Production sensitivity: [tolerable overhead / live customer-facing / dev env]
What you've already tried:
[DESCRIBE]

Why this prompt works

strace output is wall-of-text and intimidating; many engineers stop after the first few lines. But for “app hangs,” “missing file,” or “permission denied without context,” it tells you the exact syscall and arguments that failed. This prompt picks the right tool per scenario.

How to use it

  1. Pick the tool by scenario — strace for “what is it doing,” ltrace for “what library call failed,” bpftrace for production observability.
  2. Filter early. strace -e openat gives a focused view; full strace is overwhelming.
  3. For production, minimize duration. Attach, capture, detach within seconds.
  4. For permission errors, look for EACCES / EPERM in the trace output.

Useful commands

# strace basics
sudo strace -p <pid>                          # attach (Ctrl-C to detach)
sudo strace -p <pid> -e openat                 # only open syscalls
sudo strace -p <pid> -e network                 # network calls
sudo strace -p <pid> -f -o /tmp/trace.txt       # follow children, to file
sudo strace -c -p <pid>                         # summary at exit (Ctrl-C)
sudo strace -tt -T -p <pid>                     # timestamps + duration

# At launch
strace -e openat ./command
strace -ff -o trace.log ./command               # one file per PID
strace -e trace=signal ./command                # only signals

# Filter to common need
strace -e trace=file -p <pid>                   # all FS-related
strace -e trace=desc -p <pid>                   # FD-related (read/write/close)
strace -e trace=process -p <pid>                # fork/exec/clone
strace -e trace=network -p <pid>                # socket/connect/accept
strace -y -p <pid>                              # translate FDs to paths

# What is a hung process doing?
sudo cat /proc/<pid>/stack
sudo cat /proc/<pid>/wchan
sudo cat /proc/<pid>/status | grep State
sudo strace -p <pid>                            # see current syscall

# ltrace (library calls; modern binaries often fail to hook)
sudo ltrace -p <pid>
sudo ltrace -e malloc+free+strlen ./command

# perf trace (lower overhead than strace)
sudo perf trace -p <pid>
sudo perf trace --no-syscalls --event 'syscalls:sys_enter_openat' -p <pid>

# bpftrace one-liners
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'
sudo bpftrace -e 'tracepoint:syscalls:sys_exit_openat /args->ret < 0/ { printf("FAIL: %s %d\n", comm, args->ret); }'

# bcc tools (more polished)
sudo /usr/share/bcc/tools/opensnoop -p <pid>
sudo /usr/share/bcc/tools/execsnoop
sudo /usr/share/bcc/tools/tcpconnect
sudo /usr/share/bcc/tools/biolatency 5
sudo /usr/share/bcc/tools/funccount 'vfs_*'
sudo /usr/share/bcc/tools/stackcount -p <pid> -K do_softirq

# ftrace (very low overhead, kernel-side)
sudo trace-cmd record -e sched_switch sleep 5
sudo trace-cmd report

Common scenarios

”App can’t find config"

strace -e openat -f ./app 2>&1 | grep ENOENT
# Shows every "No such file" — usually the missing config path

"Permission denied” with no app log

strace -e openat,access -f ./app 2>&1 | grep -E "EACCES|EPERM"
# Shows the exact path and operation that failed

”App hangs”

# Identify state first
sudo cat /proc/<pid>/wchan
# Then attach
sudo strace -p <pid>
# Common: futex (lock), read (FD blocked), epoll_wait (idle)

Slow startup profiling

strace -c ./app          # summary at exit shows top syscalls by time
# Or:
strace -tt -e openat ./app 2>&1 | head -100

Find which library call failed

ltrace -e '*' ./app 2>&1 | tail -50
# (May not work on modern dynamic binaries; falls back to strace)

Production-safe peek at a syscall

# Use perf trace (lower overhead) or eBPF
sudo perf trace --duration 10 -p <pid>
sudo /usr/share/bcc/tools/opensnoop -p <pid>

Common findings this catches

  • openat("/etc/myapp.conf", ...) = -1 ENOENT → config missing.
  • connect(... 2.3.4.5:443) = -1 ETIMEDOUT → network reach issue, not the app.
  • Hung process in futex(FUTEX_WAIT, ...) → lock contention; investigate other threads.
  • stat() on hundreds of paths at startup → JVM/Python loading every classpath/site-packages dir. Cache or trim.
  • read(fd, ...) blocked on a socket → upstream slow; correlate with target.
  • mmap failing with ENOMEM → memory pressure or vm.max_map_count too low (common JVM issue: set sysctl -w vm.max_map_count=262144).
  • access() returning EACCES before open → app pre-checking; SELinux or POSIX perms.

When to escalate

  • Production hang requiring extensive tracing → move to eBPF tools to avoid amplifying the problem.
  • Trace evidence of a kernel-side bug (specific syscall returning impossible value) — file kernel bug with reproducer.
  • Suspected userspace tracer compatibility (ltrace on a relocated binary failing) — switch tools rather than fight it.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week