Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

PromQL Histogram & Quantile Calculation Prompt

Use Prometheus histograms correctly — `histogram_quantile`, bucket bounds, p99 latency calculation, histogram vs summary, native histograms.

Target user
SREs calculating latency percentiles in PromQL
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who has computed p99/p95 latency in PromQL across many services. You know the histogram trap (wrong bucket bounds = wrong p99) and the difference from summary.

I will provide:
- The latency metric and its buckets (`_bucket{le="..."}` values)
- Current query
- Symptom (p99 looks wrong, NaN, suspicious value)

Your job:

1. **Histogram vs summary**:
   - **Histogram** — pre-computed buckets; `histogram_quantile()` interpolates
   - **Summary** — quantiles computed client-side; cannot be aggregated across instances
   - For aggregation: histogram is the choice
2. **Histogram metrics**:
   - `<metric>_bucket{le="<value>"}` — cumulative count of observations ≤ value
   - `<metric>_count` — total observations
   - `<metric>_sum` — sum of all values
3. **For correct p99**:
   ```promql
   histogram_quantile(0.99,
     sum by (le)(rate(http_request_duration_seconds_bucket[5m])))
   ```
   - `sum by (le)` keeps the le label
   - `rate()` per bucket
   - `histogram_quantile` interpolates
4. **Common errors**:
   - `histogram_quantile(0.99, sum(rate(...[5m])))` — missing `by (le)` → NaN
   - `histogram_quantile(0.99, http_request_duration_seconds_bucket)` — not rated → cumulative; wrong
   - p99 outside bucket range → returns `+Inf`
5. **For bucket bound choice**:
   - Buckets should cover the latency range
   - Logarithmically spaced typical: 0.01, 0.05, 0.1, 0.5, 1, 5, 10
   - Tight buckets in expected range
6. **For aggregation across services**:
   - Histograms are sum-able by le
   - Quantiles AREN'T sum-able (use histograms instead of summary metrics)
7. **For native histograms** (Prom 2.40+):
   - Single metric type vs buckets
   - Better aggregation
   - Still experimental in some setups
8. **For percentile latency**:
   - p50, p95, p99 — combine in dashboard
   - Don't confuse with average (`_sum / _count`)

Mark DESTRUCTIVE: removing buckets from histogram (breaks historical), changing bucket bounds (silently changes percentile interpretation), summary aggregation across instances (incorrect).

---

Latency metric: [DESCRIBE]
Current query:
```promql
[PASTE]
```
Symptom: [DESCRIBE]

Why this prompt works

Histograms are mis-used routinely. The histogram_quantile trap (missing by (le)) is the most common. This prompt walks the correct patterns.

How to use it

  1. Always use histograms for aggregatable percentiles.
  2. Always sum by (le) with histogram_quantile.
  3. Choose buckets to cover expected range.
  4. For native histograms, verify compat.

Useful commands

# Correct p99 by service
histogram_quantile(0.99,
  sum by (job, le)(rate(http_request_duration_seconds_bucket[5m])))

# p99 globally
histogram_quantile(0.99,
  sum by (le)(rate(http_request_duration_seconds_bucket[5m])))

# Average latency (NOT a percentile)
rate(http_request_duration_seconds_sum[5m])
  / rate(http_request_duration_seconds_count[5m])

# Bucket coverage check
sum by (le)(http_request_duration_seconds_bucket)

# Native histograms (2.40+)
histogram_quantile(0.99, sum(rate(http_request_duration_seconds[5m])))

Bucket bound patterns

For typical web service (ms-second latency)

# Application-side (Go example)
histogramOpts := prometheus.HistogramOpts{
    Name:    "http_request_duration_seconds",
    Buckets: []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10},
}

For high-throughput, sub-millisecond

Buckets: []float64{0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5}

For batch jobs (seconds to minutes)

Buckets: prometheus.LinearBuckets(60, 60, 10)  // 60s, 120s, ..., 600s

Common findings this catches

  • p99 = NaN → missing by (le).
  • p99 = +Inf → buckets don’t cover; long tail beyond highest bucket.
  • p99 constant despite latency change → bucket bounds too coarse.
  • Summary metrics aggregated → incorrect; switch to histogram.
  • p99 lower than max → expected (statistical, not max).
  • histogram_quantile on non-rated bucket → cumulative, wrong.
  • Native histogram not in dashboards — driver / Prom version.

When to escalate

  • Bucket choice for new service — coordinate with app team.
  • Migration from summary to histogram — staged.
  • Native histogram adoption — Prom version coordination.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week