Tracing Linux with bpftrace and eBPF: A Practical Guide

For 20 years the answer to “why is this process slow?” was some combination of strace, perf, and guesswork. strace works, but it can slow a process down by 100x and you can’t safely point it at a busy production daemon. perf is powerful and cryptic. Then eBPF arrived, and for the first time I could ask the running kernel a precise question and get a precise answer without rebuilding anything or taking the box down.

bpftrace is the friendliest way in. It’s awk for the kernel: short one-liners that attach to kernel and userspace events, aggregate in-kernel, and print when you hit Ctrl-C. This is how I actually use it.

What eBPF gives you that nothing else does

eBPF runs small, verified programs inside the kernel in response to events: syscalls, function entry/exit, network packets, scheduler decisions. Because the aggregation happens in kernel space, you can trace high-frequency events on a production host without the overhead that makes strace dangerous.

The mental model: you attach a probe to an event, run a tiny action when it fires, and accumulate results in a map. The kernel does the counting. You just read the histogram.

Install and confirm it works

On most modern distros:

# Debian / Ubuntu
sudo apt install bpftrace
# RHEL / Fedora / Rocky
sudo dnf install bpftrace

Confirm the kernel exposes what you need:

sudo bpftrace -l 'tracepoint:syscalls:*' | head
uname -r   # you want 5.x or newer for the good stuff

If bpftrace -l prints a wall of probes, you’re ready.

One-liners I reach for constantly

Which processes are opening which files? This replaces a lot of strace -e openat flailing:

sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'

Count syscalls by process to find the chatty offender:

sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

Hit Ctrl-C and you get a sorted tally per command name. The @[key] = count() pattern is the heart of bpftrace — it builds a map and prints it on exit.

New processes as they spawn, which is gold for catching a runaway cron job or a fork bomb in the making:

sudo bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s -> %s\n", comm, str(args->filename)); }'

Histograms beat averages every time

Averages lie. A p50 disk latency of 2ms with a p99 of 900ms is a very different system than a flat 20ms, but the average hides it. bpftrace gives you distributions cheaply.

This traces block I/O latency and prints a log2 histogram:

sudo bpftrace -e '
kprobe:blk_account_io_start { @start[arg0] = nsecs; }
kprobe:blk_account_io_done /@start[arg0]/ {
    @usecs = hist((nsecs - @start[arg0]) / 1000);
    delete(@start[arg0]);
}'

When you stop it, you see exactly where your latency lives — a tidy histogram bucketed by microseconds. The first time you watch a “fast” disk reveal a fat tail above 100ms, you understand why this tool matters.

Tracing a specific function’s latency

Say a service feels slow and you suspect a particular userspace function. You can attach a uprobe to the binary and time entry-to-return:

sudo bpftrace -e '
uprobe:/usr/bin/myapp:process_request { @t[tid] = nsecs; }
uretprobe:/usr/bin/myapp:process_request /@t[tid]/ {
    @ns = hist(nsecs - @t[tid]); delete(@t[tid]);
}'

No recompile, no restart, no debugger pause. You attach, watch a few seconds of real traffic, and detach.

Where AI fits into the workflow

bpftrace syntax is terse and the probe names are a memorization problem — there are thousands of them and they shift between kernel versions. This is precisely the kind of thing I hand to a model. I describe the symptom in plain English (“count TCP retransmits per remote IP”, “show me which process is calling fsync most often”) and ask for a bpftrace one-liner, then I read it before running it.

The read-before-run discipline matters here too: a bad kprobe won’t usually hurt you because the verifier rejects unsafe programs, but a wrong probe name just wastes time. Keep a few AI prompts tuned for translating symptoms into probes, and you skip the man-page archaeology.

Safety notes from production use

The verifier is your friend, not a formality. If bpftrace refuses to load a program, it’s protecting the kernel. Don’t go hunting for ways around it.
High-frequency probes still cost something. Tracing every sys_enter on a 96-core box under load adds measurable overhead. Scope your probes — filter by comm or PID — and detach when you have your answer.
kprobe targets are not a stable ABI. A one-liner that works on 5.15 may need a different function name on 6.8. Prefer tracepoints when one exists; they’re the stable interface.
You need root or CAP_BPF. Treat that access like any other privileged capability.

Why this belongs in your toolkit

The shift eBPF represents is simple: you stop guessing and start measuring, on the live system, at the moment the problem is happening. No staging repro, no “let me add some logging and redeploy.” You ask the kernel a question and it answers.

Start with the count-by-comm and the I/O histogram above — those two alone will change how you debug. Once you trust the tool, it becomes the first thing you reach for instead of the last. For more low-level Linux debugging workflows, see the rest of the AI for Linux Admins guides.

Trace programs run in the kernel. Test one-liners on a non-critical host before pointing them at production, and verify probe names against your running kernel.