Grafana Error Guide: Panel/Alert Image Render Timeout — Tune the Renderer
Fix Grafana panel and alert image render timeouts — raise rendering timeouts, give the renderer more CPU/memory, fix slow queries and callback_url, and stop concurrent render overload.
- #grafana
- #troubleshooting
- #errors
- #image-renderer
Overview
Even with the image renderer installed, a render can time out. The renderer launches headless Chromium, loads the dashboard back through Grafana, waits for every panel query to finish, then captures a PNG. If any step exceeds the configured timeout — slow queries, an overloaded renderer, or a wrong callback URL — the render aborts and the image is missing or blank.
The literal errors you will see:
logger=rendering.http error="Get \"http://renderer:8081/render?...\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
logger=rendering error="ScreenshotOnRender: rendering timed out"
{"level":"error","msg":"Request failed","err":"Navigation timeout of 30000 ms exceeded","url":"http://grafana:3000/d-solo/..."}
It surfaces most often for alert images (each firing alert triggers a render on a tight deadline) and for large “all panels” dashboard PNGs. The dashboard opens fine interactively — only the timed, server-side capture fails.
Symptoms
- Alert notifications arrive without images; log shows
context deadline exceededat eval time. - Rendered-image share links spin then return a blank/broken PNG.
- Renderer logs show
Navigation timeout ... exceeded. - Timeouts correlate with heavy dashboards or many alerts firing at once.
docker logs grafana-renderer 2>&1 | grep -iE "timeout|deadline" | tail
{"err":"Navigation timeout of 30000 ms exceeded","msg":"Request failed"}
Common Root Causes
1. Render timeout too low for the dashboard’s queries
The default per-render timeout is modest. A dashboard whose panels each take several seconds to query can’t finish a full-page screenshot in time.
[rendering]
; total seconds allowed for a render
rendering_timeout = 30
2. Renderer is CPU/memory starved
Headless Chromium is heavy. An undersized renderer container (or the in-process plugin on a small Grafana pod) throttles and blows the deadline.
kubectl top pod -n monitoring -l app=grafana-renderer
NAME CPU(cores) MEMORY(bytes)
grafana-renderer-xxxx 980m 740Mi
3. Concurrent render overload
When many alerts fire together, render requests queue. Beyond concurrent_render_request_limit, requests wait and then time out.
[rendering]
concurrent_render_request_limit = 10
4. Wrong callback_url — renderer can’t load the dashboard back
If callback_url/root_url resolves to an address the renderer can’t reach, Chromium waits on a page that never loads and hits the navigation timeout.
curl -s -o /dev/null -w "%{http_code}\n" http://grafana:3000/api/health
5. Slow data source underneath
The render is only as fast as the slowest panel query. A struggling Prometheus/Loki/SQL source pushes the whole capture past its deadline.
Diagnostic Workflow
Step 1: Reproduce a single render and time it
time curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
"http://localhost:3000/render/d/<uid>/<slug>?width=1600&height=900&kiosk" \
-o /tmp/r.png; file /tmp/r.png
Step 2: Check renderer resource usage
kubectl top pod -n monitoring -l app=grafana-renderer
kubectl describe deploy/grafana-renderer -n monitoring | grep -A3 -i limits
Step 3: Read both sides’ logs
journalctl -u grafana-server --no-pager | grep -iE "render|deadline" | tail -20
docker logs grafana-renderer 2>&1 | grep -iE "timeout|failed" | tail -20
Step 4: Raise timeouts and concurrency sensibly
[rendering]
server_url = http://grafana-renderer:8081/render
callback_url = http://grafana:3000/
rendering_timeout = 60
concurrent_render_request_limit = 30
Set the renderer container’s own timeout to match:
env:
- name: RENDERING_TIMING_METRICS
value: "true"
- name: RENDERING_MODE
value: "clustered"
- name: RENDERING_CLUSTERING_MODE
value: "context"
- name: RENDERING_CLUSTERING_MAX_CONCURRENCY
value: "5"
Step 5: Fix the slow query, not just the timeout
curl -s -G "http://localhost:3000/api/ds/query" ... # or open the panel with the query inspector
Reduce the time range, add recording rules, or lower resolution for the rendered variant.
Example Root Cause Analysis
At peak, an on-call notices alert Slack messages arrive image-less. The renderer log:
{"err":"Navigation timeout of 30000 ms exceeded","url":"http://grafana:3000/d-solo/abc/prod-overview?panelId=8"}
Interactively the panel loads in ~9s, but during an incident 40 alerts fire simultaneously. kubectl top shows the single renderer pod pinned at its CPU limit. The render queue backs up past concurrent_render_request_limit, and each waiting request eats into its own 30s deadline.
Fix: scale the renderer horizontally and raise limits:
kubectl scale deploy/grafana-renderer -n monitoring --replicas=3
[rendering]
rendering_timeout = 60
concurrent_render_request_limit = 30
After the rollout, the same alert storm delivers all images within a few seconds. Root cause: not a bug, but render capacity — concurrency limits plus an undersized renderer under load.
Prevention Best Practices
- Right-size the renderer: give it real CPU/memory requests+limits, and scale replicas for alert-heavy setups.
- Match timeouts on both sides: Grafana’s
rendering_timeoutand the renderer’s clustering/timeout envs should agree and exceed your slowest dashboard. - Keep alert-image dashboards lean: fewer panels, shorter ranges, recording rules; consider a dedicated “for-rendering” dashboard.
- Cap and monitor concurrency (
concurrent_render_request_limit,RENDERING_CLUSTERING_MAX_CONCURRENCY) so storms queue gracefully. - Verify
callback_urlreachability from the renderer after any proxy/DNS change. - See more Grafana guides and the sibling renderer not available guide.
Quick Command Reference
# Time a single render
time curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
"http://localhost:3000/render/d/<uid>/<slug>?width=1600&height=900&kiosk" -o /tmp/r.png
# Renderer resource pressure
kubectl top pod -n monitoring -l app=grafana-renderer
kubectl describe deploy/grafana-renderer -n monitoring | grep -A3 -i limits
# Logs, both sides
journalctl -u grafana-server | grep -iE "render|deadline" | tail
docker logs grafana-renderer 2>&1 | grep -iE "timeout|failed" | tail
# Scale renderer
kubectl scale deploy/grafana-renderer -n monitoring --replicas=3
Conclusion
A render timeout means the pipeline — Chromium launch, callback load, panel queries, capture — didn’t finish inside the deadline. Typical root causes:
rendering_timeoutset too low for the dashboard’s real query time.- A CPU/memory-starved renderer that throttles under load.
- Concurrent render overload when many alerts fire together.
- A wrong
callback_urlso Chromium waits on a page it can’t load. - A slow data source dragging the whole capture past its deadline.
Reproduce one render and time it first; whether it’s fast alone but slow under load tells you if you’re chasing capacity/concurrency versus a genuinely slow dashboard or unreachable callback.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.