Grafana Error Guide: Panel/Alert Image Render Timeout

Overview

Even with the image renderer installed, a render can time out. The renderer launches headless Chromium, loads the dashboard back through Grafana, waits for every panel query to finish, then captures a PNG. If any step exceeds the configured timeout — slow queries, an overloaded renderer, or a wrong callback URL — the render aborts and the image is missing or blank.

The literal errors you will see:

logger=rendering.http error="Get \"http://renderer:8081/render?...\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

logger=rendering error="ScreenshotOnRender: rendering timed out"

{"level":"error","msg":"Request failed","err":"Navigation timeout of 30000 ms exceeded","url":"http://grafana:3000/d-solo/..."}

It surfaces most often for alert images (each firing alert triggers a render on a tight deadline) and for large “all panels” dashboard PNGs. The dashboard opens fine interactively — only the timed, server-side capture fails.

Symptoms

Alert notifications arrive without images; log shows context deadline exceeded at eval time.
Rendered-image share links spin then return a blank/broken PNG.
Renderer logs show Navigation timeout ... exceeded.
Timeouts correlate with heavy dashboards or many alerts firing at once.

docker logs grafana-renderer 2>&1 | grep -iE "timeout|deadline" | tail

{"err":"Navigation timeout of 30000 ms exceeded","msg":"Request failed"}

Common Root Causes

1. Render timeout too low for the dashboard’s queries

The default per-render timeout is modest. A dashboard whose panels each take several seconds to query can’t finish a full-page screenshot in time.

[rendering]
; total seconds allowed for a render
rendering_timeout = 30

2. Renderer is CPU/memory starved

Headless Chromium is heavy. An undersized renderer container (or the in-process plugin on a small Grafana pod) throttles and blows the deadline.

kubectl top pod -n monitoring -l app=grafana-renderer

NAME                    CPU(cores)   MEMORY(bytes)
grafana-renderer-xxxx   980m         740Mi

3. Concurrent render overload

When many alerts fire together, render requests queue. Beyond concurrent_render_request_limit, requests wait and then time out.

[rendering]
concurrent_render_request_limit = 10

4. Wrong callback_url — renderer can’t load the dashboard back

If callback_url/root_url resolves to an address the renderer can’t reach, Chromium waits on a page that never loads and hits the navigation timeout.

curl -s -o /dev/null -w "%{http_code}\n" http://grafana:3000/api/health

5. Slow data source underneath

The render is only as fast as the slowest panel query. A struggling Prometheus/Loki/SQL source pushes the whole capture past its deadline.

Diagnostic Workflow

Step 1: Reproduce a single render and time it

time curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
  "http://localhost:3000/render/d/<uid>/<slug>?width=1600&height=900&kiosk" \
  -o /tmp/r.png; file /tmp/r.png

Step 2: Check renderer resource usage

kubectl top pod -n monitoring -l app=grafana-renderer
kubectl describe deploy/grafana-renderer -n monitoring | grep -A3 -i limits

Step 3: Read both sides’ logs

journalctl -u grafana-server --no-pager | grep -iE "render|deadline" | tail -20
docker logs grafana-renderer 2>&1 | grep -iE "timeout|failed" | tail -20

Step 4: Raise timeouts and concurrency sensibly

[rendering]
server_url = http://grafana-renderer:8081/render
callback_url = http://grafana:3000/
rendering_timeout = 60
concurrent_render_request_limit = 30

Set the renderer container’s own timeout to match:

env:
  - name: RENDERING_TIMING_METRICS
    value: "true"
  - name: RENDERING_MODE
    value: "clustered"
  - name: RENDERING_CLUSTERING_MODE
    value: "context"
  - name: RENDERING_CLUSTERING_MAX_CONCURRENCY
    value: "5"

Step 5: Fix the slow query, not just the timeout

curl -s -G "http://localhost:3000/api/ds/query" ... # or open the panel with the query inspector

Reduce the time range, add recording rules, or lower resolution for the rendered variant.

Example Root Cause Analysis

At peak, an on-call notices alert Slack messages arrive image-less. The renderer log:

{"err":"Navigation timeout of 30000 ms exceeded","url":"http://grafana:3000/d-solo/abc/prod-overview?panelId=8"}

Interactively the panel loads in ~9s, but during an incident 40 alerts fire simultaneously. kubectl top shows the single renderer pod pinned at its CPU limit. The render queue backs up past concurrent_render_request_limit, and each waiting request eats into its own 30s deadline.

Fix: scale the renderer horizontally and raise limits:

kubectl scale deploy/grafana-renderer -n monitoring --replicas=3

[rendering]
rendering_timeout = 60
concurrent_render_request_limit = 30

After the rollout, the same alert storm delivers all images within a few seconds. Root cause: not a bug, but render capacity — concurrency limits plus an undersized renderer under load.

Prevention Best Practices

Right-size the renderer: give it real CPU/memory requests+limits, and scale replicas for alert-heavy setups.
Match timeouts on both sides: Grafana’s rendering_timeout and the renderer’s clustering/timeout envs should agree and exceed your slowest dashboard.
Keep alert-image dashboards lean: fewer panels, shorter ranges, recording rules; consider a dedicated “for-rendering” dashboard.
Cap and monitor concurrency (concurrent_render_request_limit, RENDERING_CLUSTERING_MAX_CONCURRENCY) so storms queue gracefully.
Verify callback_url reachability from the renderer after any proxy/DNS change.
See more Grafana guides and the sibling renderer not available guide.

Quick Command Reference

# Time a single render
time curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
  "http://localhost:3000/render/d/<uid>/<slug>?width=1600&height=900&kiosk" -o /tmp/r.png

# Renderer resource pressure
kubectl top pod -n monitoring -l app=grafana-renderer
kubectl describe deploy/grafana-renderer -n monitoring | grep -A3 -i limits

# Logs, both sides
journalctl -u grafana-server | grep -iE "render|deadline" | tail
docker logs grafana-renderer 2>&1 | grep -iE "timeout|failed" | tail

# Scale renderer
kubectl scale deploy/grafana-renderer -n monitoring --replicas=3

Conclusion

A render timeout means the pipeline — Chromium launch, callback load, panel queries, capture — didn’t finish inside the deadline. Typical root causes:

rendering_timeout set too low for the dashboard’s real query time.
A CPU/memory-starved renderer that throttles under load.
Concurrent render overload when many alerts fire together.
A wrong callback_url so Chromium waits on a page it can’t load.
A slow data source dragging the whole capture past its deadline.

Reproduce one render and time it first; whether it’s fast alone but slow under load tells you if you’re chasing capacity/concurrency versus a genuinely slow dashboard or unreachable callback.

Grafana Error Guide: Panel/Alert Image Render Timeout — Tune the Renderer