You are a senior platform engineer who has scaled Grafana — moved from SQLite to PostgreSQL/MySQL, deployed multiple replicas, tuned for many concurrent users. I will provide: - Current Grafana setup - User load - Symptom (slow, OOM, database errors) Your job: 1. **HA basics**: - Multiple Grafana replicas - Shared DB (Postgres/MySQL; NOT SQLite) - Shared session storage - Stateless replicas 2. **For database choice**: - SQLite — single-replica, small - PostgreSQL — recommended HA - MySQL — also supported 3. **For DB connection pool**: - `max_open_conn`, `max_idle_conn` - Per-replica - Total = replicas × max_open 4. **For sessions**: - DB-backed (default) - Redis for higher perf 5. **For load balancer**: - Sticky sessions OR Redis sessions - Health check on /api/health 6. **For Alertmanager / alerting HA**: - Each Grafana replica evaluates rules - Coordination via DB - Lock to prevent duplicate notifications 7. **For caching**: - Query result cache (Enterprise) - Reduces DS load 8. **For monitoring Grafana itself**: - `/metrics` endpoint - Prometheus scrape Mark DESTRUCTIVE: SQLite to PostgreSQL migration without backup, multiple replicas with non-shared DB (data inconsistency), upgrade without DB migration step. --- Current setup: [DESCRIBE] User load: [DESCRIBE] Symptom: [DESCRIBE]

Why this prompt works

Scaling Grafana requires moving from defaults. This prompt walks them.

How to use it

Migrate to PostgreSQL.
Add replicas behind LB.
Tune connection pool.
Backup DB.

Useful commands

# DB connection test
psql -h <db-host> -U grafana -d grafana -c '\dt'

# Grafana metrics
curl http://grafana:3000/metrics | head

# Active connections (Postgres)
psql -c "SELECT count(*) FROM pg_stat_activity WHERE datname='grafana';"

# Slow queries
psql -c "SELECT query, calls, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;"

# Backup
pg_dump -h <host> -U grafana grafana > grafana-backup-$(date +%F).sql

Config (PostgreSQL)

[database]
type = postgres
host = postgres.example.com:5432
name = grafana
user = grafana
password = $__file{/etc/grafana/db-password}
ssl_mode = require
max_open_conn = 100
max_idle_conn = 100
conn_max_lifetime = 14400

[remote_cache]
type = redis
connstr = addr=redis:6379,pool_size=100

[session]
provider = redis
provider_config = addr=redis:6379,pool_size=100,db=2

[server]
root_url = https://grafana.example.com/

# Image rendering (if needed)
[rendering]
server_url = http://renderer:8081/render
callback_url = http://grafana:3000/

[alerting]
ha_peers = grafana-1:9094,grafana-2:9094
ha_listen_address = 0.0.0.0:9094

Helm values (HA)

grafana:
  replicas: 3
  podAntiAffinity: hard

  grafana.ini:
    database:
      type: postgres
      host: postgres:5432
      name: grafana
      user: grafana
      password: $__file{/etc/secrets/db-password}
    session:
      provider: redis
      provider_config: addr=redis:6379

  persistence:
    enabled: false                  # no PV needed with HA + external DB

  resources:
    requests:
      cpu: 200m
      memory: 512Mi
    limits:
      memory: 1Gi

Common findings this catches

HA replicas with SQLite → data corruption.
DB connection pool too small → wait times.
Without Redis sessions → user lost on replica change.
No image renderer → reports fail.
DB CPU spikes → tune queries; index.
Alerts firing twice → HA coordination.
Slow startup → migration on every restart.

When to escalate

Major migration to HA — staged.
DB infrastructure — DBA.
Monitoring at scale — capacity.

Grafana HA & Database Backend Tuning Prompt

Why this prompt works

How to use it

Useful commands

Config (PostgreSQL)

Helm values (HA)

Common findings this catches

When to escalate

Related prompts

Grafana Dashboard Performance Prompt

Grafana Version Upgrade & Migration Prompt

Prometheus HA & Deduplication Prompt

Why this prompt works

How to use it

Useful commands

Config (PostgreSQL)

Helm values (HA)

Common findings this catches

When to escalate

Related prompts

Grafana Dashboard Performance Prompt

Grafana Version Upgrade & Migration Prompt

Prometheus HA & Deduplication Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet