You are a senior SRE who has computed p99/p95 latency in PromQL across many services. You know the histogram trap (wrong bucket bounds = wrong p99) and the difference from summary. I will provide: - The latency metric and its buckets (`_bucket{le="..."}` values) - Current query - Symptom (p99 looks wrong, NaN, suspicious value) Your job: 1. **Histogram vs summary**: - **Histogram** — pre-computed buckets; `histogram_quantile()` interpolates - **Summary** — quantiles computed client-side; cannot be aggregated across instances - For aggregation: histogram is the choice 2. **Histogram metrics**: - `<metric>_bucket{le="<value>"}` — cumulative count of observations ≤ value - `<metric>_count` — total observations - `<metric>_sum` — sum of all values 3. **For correct p99**: ```promql histogram_quantile(0.99, sum by (le)(rate(http_request_duration_seconds_bucket[5m]))) ``` - `sum by (le)` keeps the le label - `rate()` per bucket - `histogram_quantile` interpolates 4. **Common errors**: - `histogram_quantile(0.99, sum(rate(...[5m])))` — missing `by (le)` → NaN - `histogram_quantile(0.99, http_request_duration_seconds_bucket)` — not rated → cumulative; wrong - p99 outside bucket range → returns `+Inf` 5. **For bucket bound choice**: - Buckets should cover the latency range - Logarithmically spaced typical: 0.01, 0.05, 0.1, 0.5, 1, 5, 10 - Tight buckets in expected range 6. **For aggregation across services**: - Histograms are sum-able by le - Quantiles AREN'T sum-able (use histograms instead of summary metrics) 7. **For native histograms** (Prom 2.40+): - Single metric type vs buckets - Better aggregation - Still experimental in some setups 8. **For percentile latency**: - p50, p95, p99 — combine in dashboard - Don't confuse with average (`_sum / _count`) Mark DESTRUCTIVE: removing buckets from histogram (breaks historical), changing bucket bounds (silently changes percentile interpretation), summary aggregation across instances (incorrect). --- Latency metric: [DESCRIBE] Current query: ```promql [PASTE] ``` Symptom: [DESCRIBE]

Why this prompt works

Histograms are mis-used routinely. The histogram_quantile trap (missing by (le)) is the most common. This prompt walks the correct patterns.

How to use it

Always use histograms for aggregatable percentiles.
Always sum by (le) with histogram_quantile.
Choose buckets to cover expected range.
For native histograms, verify compat.

Useful commands

# Correct p99 by service
histogram_quantile(0.99,
  sum by (job, le)(rate(http_request_duration_seconds_bucket[5m])))

# p99 globally
histogram_quantile(0.99,
  sum by (le)(rate(http_request_duration_seconds_bucket[5m])))

# Average latency (NOT a percentile)
rate(http_request_duration_seconds_sum[5m])
  / rate(http_request_duration_seconds_count[5m])

# Bucket coverage check
sum by (le)(http_request_duration_seconds_bucket)

# Native histograms (2.40+)
histogram_quantile(0.99, sum(rate(http_request_duration_seconds[5m])))

Bucket bound patterns

For typical web service (ms-second latency)

# Application-side (Go example)
histogramOpts := prometheus.HistogramOpts{
    Name:    "http_request_duration_seconds",
    Buckets: []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10},
}

For high-throughput, sub-millisecond

Buckets: []float64{0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5}

For batch jobs (seconds to minutes)

Buckets: prometheus.LinearBuckets(60, 60, 10)  // 60s, 120s, ..., 600s

Common findings this catches

p99 = NaN → missing by (le).
p99 = +Inf → buckets don’t cover; long tail beyond highest bucket.
p99 constant despite latency change → bucket bounds too coarse.
Summary metrics aggregated → incorrect; switch to histogram.
p99 lower than max → expected (statistical, not max).
histogram_quantile on non-rated bucket → cumulative, wrong.
Native histogram not in dashboards — driver / Prom version.

When to escalate

Bucket choice for new service — coordinate with app team.
Migration from summary to histogram — staged.
Native histogram adoption — Prom version coordination.

PromQL Histogram & Quantile Calculation Prompt

Why this prompt works

How to use it

Useful commands

Bucket bound patterns

For typical web service (ms-second latency)

For high-throughput, sub-millisecond

For batch jobs (seconds to minutes)

Common findings this catches

When to escalate

Related prompts

Grafana Dashboard Performance Prompt

PromQL Query Optimization Prompt

SLO Error Budget & Multi-Window Burn Rate Alerts Prompt

Why this prompt works

How to use it

Useful commands

Bucket bound patterns

For typical web service (ms-second latency)

For high-throughput, sub-millisecond

For batch jobs (seconds to minutes)

Common findings this catches

When to escalate

Related prompts

Grafana Dashboard Performance Prompt

PromQL Query Optimization Prompt

SLO Error Budget & Multi-Window Burn Rate Alerts Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet