Grafana Dashboard Performance Prompt
Optimize Grafana dashboards — query parallelism, refresh rates, variable design, panel count, data source pressure.
- Target user
- SREs maintaining Grafana dashboards at scale
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who has cleaned up Grafana dashboards across teams — reducing load times from minutes to seconds. I will provide: - The slow dashboard - Symptom (slow load, high data source CPU, queries timing out) - Panel count, variable count Your job: 1. **Panel count**: - Each panel = one (or more) queries - Too many → data source overwhelmed - Consolidate, hide collapsed by default 2. **Refresh rates**: - Tab open with 5s refresh × 50 panels × many users = data source DoS - 30s default reasonable; 1m for less critical 3. **Time ranges**: - Default range affects query load - Long ranges + small intervals = many samples 4. **Variables**: - Cached vs reloaded each refresh - Multi-value with many options = expensive - Hide unused 5. **For query performance**: - Use recording rules for hot queries - Reduce label noise - Avoid join-heavy queries 6. **For data source pressure**: - Per-DS connection limits - Long-range queries pin connections 7. **For panel-specific**: - Stat panels query once (cheap) - Time series with many series = expensive - Table panels efficient for top-N 8. **For variable design**: - "All" with many values → many queries - Conditional panels based on variable Mark DESTRUCTIVE: setting auto-refresh too short (DoS), variables with `All` selecting thousands, dashboards loading at user-bypassed time ranges. --- Slow dashboard: [DESCRIBE] Panel + variable count: [DESCRIBE] Symptom: [DESCRIBE]
Why this prompt works
Slow dashboards have many causes. This prompt walks them.
How to use it
- Audit panel count and refresh.
- For hot queries, recording rules.
- Limit time ranges.
- Tune variables.
Useful commands
# Grafana CLI for dashboard analysis (Grafana 10+)
# Or use API
# Get dashboard
curl -u admin:admin http://grafana:3000/api/dashboards/uid/<uid> | \
jq '.dashboard.panels | length' # panel count
curl -u admin:admin http://grafana:3000/api/dashboards/uid/<uid> | \
jq '.dashboard.refresh' # refresh interval
# Find dashboards with many panels
curl -u admin:admin "http://grafana:3000/api/search?type=dash-db&limit=100" | \
jq '.[].uid' | \
while read UID; do
UID_CLEAN=$(echo $UID | tr -d '"')
COUNT=$(curl -s -u admin:admin "http://grafana:3000/api/dashboards/uid/$UID_CLEAN" | jq '.dashboard.panels | length')
echo "$COUNT $UID_CLEAN"
done | sort -rn | head
# Per-data source query rate (in Prometheus)
sum by (datasource)(rate(grafana_datasource_response_duration_seconds_count[5m]))
Optimization patterns
Consolidate variables
Bad:
- $instance: query=node_memory_MemTotal_bytes (all hostnames)
- $job: query=up (all jobs)
- $cluster: query=up (all clusters)
- $datacenter: query=node_uname_info (all DCs)
Good:
- $datacenter: query=label_values(up, datacenter)
- $cluster: query=label_values(up{datacenter=$datacenter}, cluster)
→ cascading, smaller result sets
Reduce panel count via row collapse
{
"title": "Detailed metrics",
"type": "row",
"collapsed": true, // collapsed by default; queries don't run
"panels": [...]
}
Recording rule for hot panel
Instead of complex query in panel:
# recording rules
- record: job:http_requests:rate5m
expr: sum by (job)(rate(http_requests_total[5m]))
In panel: job:http_requests:rate5m — fast.
Common findings this catches
- 100+ panels → split into multiple dashboards.
- 5s refresh for non-realtime → 30s+.
Allvariables fanning out → restrict to specific.- Long range with short step → many samples; tune step.
- Cardinality in legend → simplify.
- Cross-DS panels compound → consolidate.
- Test dashboards in prod → mark as draft.
When to escalate
- Dashboard pruning across org — coordination.
- Data source scaling — strategic.
- Standardized dashboard templates — team.
Related prompts
-
Grafana Templating & Variables Design Prompt
Design Grafana variables — query variables, custom, interval, chained, multi-value, regex; debug missing values, slow load.
-
PromQL Query Optimization Prompt
Diagnose slow PromQL queries — cardinality explosion, range vector traps, sum vs avg pitfalls, query timeout, recording rules opportunity.
-
PromQL Recording Rules Design Prompt
Design Prometheus recording rules — naming convention, evaluation interval, when to use, retention, multi-cluster patterns.