Grafana Heatmap & Histogram Visualization Prompt
Configure Grafana heatmaps for latency distribution — bucket binning, classic vs new heatmap, histogram source data.
- Target user
- SREs visualizing latency distributions
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who has built heatmaps to visualize latency distributions, request size patterns, and other histograms over time. I will provide: - The metric being visualized - Current panel config - Symptom (heatmap blank, colors wrong, wrong story) Your job: 1. **When heatmap**: - Visualizing distribution over time - Where averages mislead (latency) - Identifying outliers / multi-modal distributions 2. **For Prometheus histogram input**: - `sum by (le)(rate(http_request_duration_seconds_bucket[5m]))` - Format as Heatmap (data source option) - le label = bucket boundary 3. **For panel options**: - **Calculate from data** — for instant histograms - **Use cumulative** vs **use rate** - Color scheme: spectrum (red-blue) or sequential 4. **For y-axis**: - Log scale typical for latency - Set min/max if needed 5. **For interpretation**: - X axis: time - Y axis: value bucket - Color: count - Bright bands = popular values - Multi-modal: two bright bands at different y values 6. **For overlay**: - p99 / p95 lines on top of heatmap - Combine in single panel 7. **For new heatmap (vs classic)**: - Better rendering - More options - Migration path 8. **For non-histogram input**: - Group rate by `le` after binning - Use Reduce transformation Mark DESTRUCTIVE: changing bucket bounds at metric source breaks historical heatmaps, removing histogram metric (loses distribution), heatmap with wrong unit (misread). --- Metric: [DESCRIBE] Current config: [DESCRIBE] Symptom: [DESCRIBE]
Why this prompt works
Heatmaps are powerful but tricky. This prompt walks them.
How to use it
- Use histogram metric as source.
- Match buckets to range of interest.
- Log scale y-axis typical.
- Overlay percentiles for context.
Useful PromQL for heatmaps
# Per-bucket rate (heatmap input)
sum by (le)(rate(http_request_duration_seconds_bucket[5m]))
# For new heatmap (instant samples per bucket)
sum by (le)(increase(http_request_duration_seconds_bucket[1m]))
# Overlay percentile
histogram_quantile(0.99, sum by (le)(rate(http_request_duration_seconds_bucket[5m])))
Panel config
{
"type": "heatmap",
"title": "Request Latency Heatmap",
"datasource": "Prometheus",
"targets": [{
"expr": "sum by (le)(rate(http_request_duration_seconds_bucket[5m]))",
"format": "heatmap",
"legendFormat": "{{le}}"
}],
"options": {
"calculate": false,
"yAxis": {
"unit": "s",
"logBase": 2,
"min": "0.001",
"max": "10"
},
"color": {
"mode": "scheme",
"scheme": "RdYlGn",
"steps": 16,
"reverse": true // dark = high count
},
"exemplars": { "color": "rgba(255,0,255,0.7)" }
}
}
Multi-panel pattern (heatmap + percentile overlay)
{
"type": "timeseries", // alternative: combined
"title": "Latency Heatmap + p99",
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le)(rate(http_request_duration_seconds_bucket[5m])))",
"legendFormat": "p99"
},
{
"expr": "histogram_quantile(0.50, sum by (le)(rate(http_request_duration_seconds_bucket[5m])))",
"legendFormat": "p50"
}
]
}
(Use side-by-side panels: heatmap + percentile time series.)
Common findings this catches
- Blank heatmap → format not set to “heatmap”; missing
by (le). - Single horizontal band → all requests in one bucket; need finer buckets.
- Y-axis linear with logarithmic data → switch to log scale.
- Wrong unit → seconds vs ms confusion.
- Heatmap slow → bucket count + time range; reduce.
- Cold periods showing as part of distribution → expected if traffic varies.
- Bimodal not visible → bucket bounds too coarse.
When to escalate
- Adding histograms to app — coordinate with app team.
- Bucket bound design — domain knowledge.
- Custom histogram backend — engineering.
Related prompts
-
Grafana Panel Types Selection Prompt
Choose the right Grafana panel — timeseries vs stat vs gauge vs bar gauge vs heatmap vs table; visualization principles for each.
-
PromQL Histogram & Quantile Calculation Prompt
Use Prometheus histograms correctly — `histogram_quantile`, bucket bounds, p99 latency calculation, histogram vs summary, native histograms.
-
PromQL Query Optimization Prompt
Diagnose slow PromQL queries — cardinality explosion, range vector traps, sum vs avg pitfalls, query timeout, recording rules opportunity.