Continuous Profiling With Pyroscope Alongside Prometheus

Metrics, logs, and traces will tell you that a service is burning CPU or leaking memory, and roughly when. What they almost never tell you is which function is responsible. For that you’ve historically had to reproduce the problem locally and run a profiler — which is hopeless for issues that only appear under production load. Continuous profiling closes that gap, and Grafana Pyroscope slots in next to Prometheus to give you always-on, low-overhead flame graphs from production. After running it for a while, I treat it as the fourth pillar.

What continuous profiling is

A traditional profiler samples your program’s stack at intervals to build a picture of where CPU time goes. Continuous profiling does the same thing, constantly, in production, at low enough overhead (typically a couple percent) that you leave it on forever. Pyroscope stores these profiles as time series of stack traces, so you can ask “what was using CPU in the checkout service between 14:02 and 14:08 yesterday” and get a flame graph for exactly that window — even though the incident is long over.

This is the key difference from a one-off profile: it’s retrospective. The spike already happened, you weren’t watching, and you can still see what caused it.

Where it sits relative to Prometheus

Prometheus answers how much and when: CPU is at 90%, starting at 14:02. Pyroscope answers why: 60% of that CPU is in JSON serialization because someone shipped an N+1 marshaling loop. They’re complementary, and the workflow is to pivot between them — your Prometheus alert fires, you see the CPU climb, then you jump to Pyroscope for the same window and time range to find the offending stack.

Pyroscope even reuses Prometheus’ label model. Profiles are tagged with service_name, and you can add labels like region or version, then filter flame graphs the same way you’d filter a PromQL query.

Getting profiles flowing

There are two ingestion styles. The pull model uses a Grafana Agent / Alloy that scrapes pprof endpoints, much like Prometheus scrapes /metrics:

# Alloy / Grafana Agent profiling scrape
pyroscope.scrape "default" {
  targets = [
    {"__address__" = "checkout:6060", "service_name" = "checkout"},
  ]
  forward_to = [pyroscope.write.default.receiver]
  profiling_config {
    profile.process_cpu { enabled = true }
    profile.memory      { enabled = true }
  }
}

pyroscope.write "default" {
  endpoint { url = "http://pyroscope:4040" }
}

For Go, you expose the standard net/http/pprof endpoints and the agent scrapes them — no code changes beyond importing the pprof handler. The push model uses a language SDK that ships profiles directly:

pyroscope.Start(pyroscope.Config{
    ApplicationName: "checkout",
    ServerAddress:   "http://pyroscope:4040",
    Tags:            map[string]string{"region": "us-east"},
    ProfileTypes: []pyroscope.ProfileType{
        pyroscope.ProfileCPU,
        pyroscope.ProfileAllocObjects,
        pyroscope.ProfileInuseSpace,
    },
})

For languages without a native pprof story (Python, Ruby, .NET, Java), the SDKs use the runtime’s sampling profiler under the hood. In Kubernetes you can also go fully zero-instrumentation with an eBPF-based agent that profiles every process on the node — no SDK, no code change — at the cost of less language-level detail.

Reading a flame graph during an incident

When CPU alerts, here’s the loop. The flame graph’s x-axis is proportion of samples (wider = more CPU), and the y-axis is stack depth. You read it top-down looking for wide plateaus:

A wide bar near the top that you didn’t expect is your hot spot — that function and its children are eating the CPU.
Compare two time ranges (before the spike and during) using Pyroscope’s diff view. The diff highlights what got more expensive, which is usually the regression you shipped.

The diff view is the single most useful feature. “What changed between the healthy window and the bad window” answers the regression question directly, the same way a metrics diff would, but at the function level.

Memory profiling catches the slow leaks

CPU is the obvious use, but in-use-space memory profiles are how I’ve found leaks that took days to manifest. Pyroscope’s inuse_space profile shows which allocation sites hold live memory right now. Watch it trend over a day and the leaking call site grows steadily while everything else stays flat — the flame graph literally shows you the leak getting wider.

Pair this with the Prometheus view of the process: process_resident_memory_bytes climbing in a sawtooth that never fully recovers is your alert; the Pyroscope inuse_space profile is your culprit.

Cost and overhead reality

The two questions everyone asks. Overhead: CPU profiling at the default sample rate is typically 1-3% — cheap enough to run everywhere. eBPF whole-node profiling is similar. Storage: profiles are more voluminous than metrics, so set retention deliberately (a couple of weeks is usually plenty — you investigate recent incidents, not ancient ones) and consider profiling a representative subset of replicas rather than every pod if cost matters.

Why bother

Without continuous profiling, “the service is slow under load and we can’t reproduce it” is a multi-day investigation that often ends in a shrug. With it, you click the bad time window and read the answer off a flame graph. The first time it turns a week-long mystery into a five-minute look, it pays for itself.

For the tracing and metrics pillars Pyroscope complements, see our distributed-tracing and exemplar guides in the Prometheus and monitoring category. And when a CPU or memory alert kicks off the investigation, our monitoring alert assistant helps tune the rule that sent you to the flame graph.

Profiling overhead and SDK support vary by language and runtime. Validate the overhead in a staging environment before enabling fleet-wide.