Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Grafana Heatmap & Histogram Visualization Prompt

Configure Grafana heatmaps for latency distribution — bucket binning, classic vs new heatmap, histogram source data.

Target user
SREs visualizing latency distributions
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who has built heatmaps to visualize latency distributions, request size patterns, and other histograms over time.

I will provide:
- The metric being visualized
- Current panel config
- Symptom (heatmap blank, colors wrong, wrong story)

Your job:

1. **When heatmap**:
   - Visualizing distribution over time
   - Where averages mislead (latency)
   - Identifying outliers / multi-modal distributions
2. **For Prometheus histogram input**:
   - `sum by (le)(rate(http_request_duration_seconds_bucket[5m]))`
   - Format as Heatmap (data source option)
   - le label = bucket boundary
3. **For panel options**:
   - **Calculate from data** — for instant histograms
   - **Use cumulative** vs **use rate**
   - Color scheme: spectrum (red-blue) or sequential
4. **For y-axis**:
   - Log scale typical for latency
   - Set min/max if needed
5. **For interpretation**:
   - X axis: time
   - Y axis: value bucket
   - Color: count
   - Bright bands = popular values
   - Multi-modal: two bright bands at different y values
6. **For overlay**:
   - p99 / p95 lines on top of heatmap
   - Combine in single panel
7. **For new heatmap (vs classic)**:
   - Better rendering
   - More options
   - Migration path
8. **For non-histogram input**:
   - Group rate by `le` after binning
   - Use Reduce transformation

Mark DESTRUCTIVE: changing bucket bounds at metric source breaks historical heatmaps, removing histogram metric (loses distribution), heatmap with wrong unit (misread).

---

Metric: [DESCRIBE]
Current config: [DESCRIBE]
Symptom: [DESCRIBE]

Why this prompt works

Heatmaps are powerful but tricky. This prompt walks them.

How to use it

  1. Use histogram metric as source.
  2. Match buckets to range of interest.
  3. Log scale y-axis typical.
  4. Overlay percentiles for context.

Useful PromQL for heatmaps

# Per-bucket rate (heatmap input)
sum by (le)(rate(http_request_duration_seconds_bucket[5m]))

# For new heatmap (instant samples per bucket)
sum by (le)(increase(http_request_duration_seconds_bucket[1m]))

# Overlay percentile
histogram_quantile(0.99, sum by (le)(rate(http_request_duration_seconds_bucket[5m])))

Panel config

{
  "type": "heatmap",
  "title": "Request Latency Heatmap",
  "datasource": "Prometheus",
  "targets": [{
    "expr": "sum by (le)(rate(http_request_duration_seconds_bucket[5m]))",
    "format": "heatmap",
    "legendFormat": "{{le}}"
  }],
  "options": {
    "calculate": false,
    "yAxis": {
      "unit": "s",
      "logBase": 2,
      "min": "0.001",
      "max": "10"
    },
    "color": {
      "mode": "scheme",
      "scheme": "RdYlGn",
      "steps": 16,
      "reverse": true                    // dark = high count
    },
    "exemplars": { "color": "rgba(255,0,255,0.7)" }
  }
}

Multi-panel pattern (heatmap + percentile overlay)

{
  "type": "timeseries",                            // alternative: combined
  "title": "Latency Heatmap + p99",
  "targets": [
    {
      "expr": "histogram_quantile(0.99, sum by (le)(rate(http_request_duration_seconds_bucket[5m])))",
      "legendFormat": "p99"
    },
    {
      "expr": "histogram_quantile(0.50, sum by (le)(rate(http_request_duration_seconds_bucket[5m])))",
      "legendFormat": "p50"
    }
  ]
}

(Use side-by-side panels: heatmap + percentile time series.)

Common findings this catches

  • Blank heatmap → format not set to “heatmap”; missing by (le).
  • Single horizontal band → all requests in one bucket; need finer buckets.
  • Y-axis linear with logarithmic data → switch to log scale.
  • Wrong unit → seconds vs ms confusion.
  • Heatmap slow → bucket count + time range; reduce.
  • Cold periods showing as part of distribution → expected if traffic varies.
  • Bimodal not visible → bucket bounds too coarse.

When to escalate

  • Adding histograms to app — coordinate with app team.
  • Bucket bound design — domain knowledge.
  • Custom histogram backend — engineering.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week