Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Grafana Dashboard Performance Prompt

Optimize Grafana dashboards — query parallelism, refresh rates, variable design, panel count, data source pressure.

Target user
SREs maintaining Grafana dashboards at scale
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who has cleaned up Grafana dashboards across teams — reducing load times from minutes to seconds.

I will provide:
- The slow dashboard
- Symptom (slow load, high data source CPU, queries timing out)
- Panel count, variable count

Your job:

1. **Panel count**:
   - Each panel = one (or more) queries
   - Too many → data source overwhelmed
   - Consolidate, hide collapsed by default
2. **Refresh rates**:
   - Tab open with 5s refresh × 50 panels × many users = data source DoS
   - 30s default reasonable; 1m for less critical
3. **Time ranges**:
   - Default range affects query load
   - Long ranges + small intervals = many samples
4. **Variables**:
   - Cached vs reloaded each refresh
   - Multi-value with many options = expensive
   - Hide unused
5. **For query performance**:
   - Use recording rules for hot queries
   - Reduce label noise
   - Avoid join-heavy queries
6. **For data source pressure**:
   - Per-DS connection limits
   - Long-range queries pin connections
7. **For panel-specific**:
   - Stat panels query once (cheap)
   - Time series with many series = expensive
   - Table panels efficient for top-N
8. **For variable design**:
   - "All" with many values → many queries
   - Conditional panels based on variable

Mark DESTRUCTIVE: setting auto-refresh too short (DoS), variables with `All` selecting thousands, dashboards loading at user-bypassed time ranges.

---

Slow dashboard: [DESCRIBE]
Panel + variable count: [DESCRIBE]
Symptom: [DESCRIBE]

Why this prompt works

Slow dashboards have many causes. This prompt walks them.

How to use it

  1. Audit panel count and refresh.
  2. For hot queries, recording rules.
  3. Limit time ranges.
  4. Tune variables.

Useful commands

# Grafana CLI for dashboard analysis (Grafana 10+)
# Or use API

# Get dashboard
curl -u admin:admin http://grafana:3000/api/dashboards/uid/<uid> | \
    jq '.dashboard.panels | length'                    # panel count

curl -u admin:admin http://grafana:3000/api/dashboards/uid/<uid> | \
    jq '.dashboard.refresh'                            # refresh interval

# Find dashboards with many panels
curl -u admin:admin "http://grafana:3000/api/search?type=dash-db&limit=100" | \
    jq '.[].uid' | \
    while read UID; do
        UID_CLEAN=$(echo $UID | tr -d '"')
        COUNT=$(curl -s -u admin:admin "http://grafana:3000/api/dashboards/uid/$UID_CLEAN" | jq '.dashboard.panels | length')
        echo "$COUNT $UID_CLEAN"
    done | sort -rn | head

# Per-data source query rate (in Prometheus)
sum by (datasource)(rate(grafana_datasource_response_duration_seconds_count[5m]))

Optimization patterns

Consolidate variables

Bad:
- $instance: query=node_memory_MemTotal_bytes (all hostnames)
- $job: query=up (all jobs)
- $cluster: query=up (all clusters)
- $datacenter: query=node_uname_info (all DCs)

Good:
- $datacenter: query=label_values(up, datacenter)
- $cluster: query=label_values(up{datacenter=$datacenter}, cluster)
  → cascading, smaller result sets

Reduce panel count via row collapse

{
  "title": "Detailed metrics",
  "type": "row",
  "collapsed": true,                  // collapsed by default; queries don't run
  "panels": [...]
}

Recording rule for hot panel

Instead of complex query in panel:

# recording rules
- record: job:http_requests:rate5m
  expr: sum by (job)(rate(http_requests_total[5m]))

In panel: job:http_requests:rate5m — fast.

Common findings this catches

  • 100+ panels → split into multiple dashboards.
  • 5s refresh for non-realtime → 30s+.
  • All variables fanning out → restrict to specific.
  • Long range with short step → many samples; tune step.
  • Cardinality in legend → simplify.
  • Cross-DS panels compound → consolidate.
  • Test dashboards in prod → mark as draft.

When to escalate

  • Dashboard pruning across org — coordination.
  • Data source scaling — strategic.
  • Standardized dashboard templates — team.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week