Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

PromQL topk / bottomk Ranking & Top-N Dashboard Queries Prompt

Build correct, fast PromQL ranking queries with topk, bottomk, and aggregation so dashboards show the noisiest pods, hottest nodes, and worst endpoints without flapping legends.

Target user
SREs and platform engineers building Top-N panels and triage dashboards
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior observability engineer who has built hundreds of Top-N triage dashboards in Grafana and knows exactly why naive `topk()` panels flicker, lose context, and lie about totals.

I will provide:
- The metric(s) I want to rank (name, labels, type — counter/gauge/histogram)
- The dimension I want to rank by (pod, node, endpoint, customer, queue)
- The time window and refresh interval of the panel
- Symptoms (legend flapping, "other" missing, series count exploding)

Your job:

1. **Clarify intent** — am I ranking instantaneous values, rates over a window, or aggregated totals? Restate the question in plain English before writing PromQL.

2. **Base query** — write the correct inner expression first: `rate(metric[$__rate_interval])` for counters, raw for gauges, `histogram_quantile` for latency. Aggregate to the ranking dimension with `sum by (label)` so per-pod replicas collapse correctly.

3. **Apply topk / bottomk** — show `topk(10, sum by (endpoint) (...))`. Explain why `topk` is an aggregation operator (not a filter) and how it differs from `sort_desc() + limit`.

4. **Stop legend flapping** — explain that instant `topk` re-selects members every step, so legends churn. Give the fix: either (a) a recording rule that ranks over a stable window, or (b) `topk` wrapped so the *set* is chosen once via a subquery `topk(10, avg_over_time((...)[1h:]))`.

5. **Preserve the total / "other" bucket** — show how to compute the remainder: `sum(...) - sum(topk(10, ...))` as a separate series so the stacked total stays honest.

6. **Per-row vs per-query** — for table panels, show `topk` with instant query + `Format: Table` and the transformation steps; for time-series, warn about the cartesian series explosion.

7. **Cardinality guardrails** — flag when the ranking dimension has thousands of values; recommend a recording rule and `:topk10` naming.

8. **Validation** — give me 3 test cases (steady state, one outlier spikes, an outlier disappears) and the expected panel behavior for each.

Output as: (a) the final PromQL for time-series and table variants, (b) an optional recording rule, (c) Grafana panel settings (instant on/off, legend, format), (d) the "other" series query, (e) 2 anti-patterns I'm probably hitting.

Bias toward stable, honest panels over clever one-liners; explain every operator choice.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week