Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Grafana By James Joyner IV · · 9 min read

Grafana Error Guide: CloudWatch 'Rate exceeded' — Throttling the Data Source

Fix Grafana CloudWatch 'Rate exceeded' throttling errors — reduce GetMetricData API calls, raise account API limits, tune intervals and dashboards, and add retries so panels stop failing.

  • #grafana
  • #troubleshooting
  • #errors
  • #cloudwatch

Overview

Grafana’s CloudWatch data source calls the AWS GetMetricData (and ListMetrics) APIs, which are rate-limited per account per region. When too many queries fire at once — many panels, short refresh intervals, many dashboards, or many concurrent users — AWS returns a ThrottlingException: Rate exceeded and the affected panels show no data. It’s a client-side/AWS-quota problem, not a Grafana bug.

The literal errors you will see:

ThrottlingException: Rate exceeded
CloudWatch error: Throttling: Rate exceeded status code: 400
logger=tsdb.cloudwatch error="failed to execute query" err="ThrottlingException: Rate exceeded"

It occurs on dashboard load/refresh, worse during incidents when everyone opens CloudWatch dashboards at once.

Symptoms

  • CloudWatch panels intermittently show “Rate exceeded” / no data, then recover.
  • More panels fail as dashboards get larger or refresh faster.
  • Errors spike when many users load CloudWatch dashboards simultaneously.
  • AWS ThrottledCount/CallCount metrics for GetMetricData climb.
journalctl -u grafana-server --no-pager | grep -i "ThrottlingException\|Rate exceeded" | tail
logger=tsdb.cloudwatch error="ThrottlingException: Rate exceeded"

Common Root Causes

1. Too many GetMetricData calls per refresh

Each panel/query maps to GetMetricData work. Big dashboards with many panels and short refresh intervals blow through the per-second transactions-per-second (TPS) limit.

2. Refresh interval too aggressive

Dashboards set to refresh every 5–10s multiply the call rate; a heavy CloudWatch dashboard on a fast refresh throttles quickly.

3. Many concurrent users / dashboards

During an incident, dozens of people open the same CloudWatch dashboards, and combined they exceed the account’s regional API quota.

4. High-cardinality queries / wildcards

Broad dimension wildcards or * on ListMetrics fan out into many metrics and many data-point requests.

5. Default account API limit too low

The account’s CloudWatch API request-rate quota hasn’t been raised for the monitoring workload; the ceiling is simply too low.

Diagnostic Workflow

Step 1: Confirm throttling in Grafana logs

journalctl -u grafana-server --no-pager | grep -iE "ThrottlingException|Rate exceeded" | tail -20
docker logs grafana 2>&1 | grep -iE "Throttling|Rate exceeded" | tail

Step 2: Measure the API call rate from AWS side

aws cloudwatch get-metric-statistics \
  --namespace AWS/Usage \
  --metric-name CallCount \
  --dimensions Name=Type,Value=API Name=Resource,Value=GetMetricData Name=Service,Value=CloudWatch Name=Class,Value=None \
  --start-time "$(date -u -d '1 hour ago' +%FT%TZ)" \
  --end-time "$(date -u +%FT%TZ)" \
  --period 60 --statistics Sum --region us-east-1

Compare the per-minute Sum against your account’s GetMetricData quota.

Step 3: Check the data source config

# Grafana provisioned CloudWatch data source
apiVersion: 1
datasources:
  - name: CloudWatch
    type: cloudwatch
    jsonData:
      authType: default        # or keys / arn
      defaultRegion: us-east-1

Step 4: Reduce demand on dashboards

  • Raise dashboard refresh intervals (e.g. 1m instead of 5s).
  • Split huge dashboards; reduce panels per dashboard.
  • Replace dimension wildcards with explicit dimensions.
  • Increase the min interval so fewer data points are requested.

Step 5: Request an AWS quota increase

aws service-quotas get-service-quota \
  --service-code monitoring \
  --quota-code L-yyyyyyyy --region us-east-1
# then request-service-quota-increase for the GetMetricData transaction rate

Example Root Cause Analysis

During an outage, the primary CloudWatch dashboard shows scattered “Rate exceeded” panels while 30 responders have it open. Grafana log:

logger=tsdb.cloudwatch error="ThrottlingException: Rate exceeded status code: 400"

The dashboard has 40 CloudWatch panels and a 10s refresh. With 30 concurrent viewers that’s a very high GetMetricData rate. The AWS CallCount for GetMetricData confirms the account is at its regional TPS quota:

GetMetricData CallCount (per min): 48200  # over quota during the incident

Two fixes applied together:

  1. Reduce demand — raise the dashboard refresh to 1m, split the 40 panels into two focused dashboards, and set a sensible min interval.
  2. Raise the ceiling — request a GetMetricData transaction-rate quota increase for the region.

After the refresh change the throttling stops immediately (call rate drops ~6x), and the quota increase gives headroom for future incidents. Root cause: aggressive refresh × many panels × many concurrent viewers exceeding the account’s GetMetricData rate — not a Grafana defect.

Prevention Best Practices

  • Use conservative dashboard refresh intervals (1m+) for CloudWatch; avoid 5–10s on heavy dashboards.
  • Keep CloudWatch dashboards lean: fewer panels, explicit dimensions instead of wildcards, sensible min interval.
  • Consider metric streams / exporting CloudWatch metrics to Prometheus for high-traffic dashboards to offload the API.
  • Monitor your own GetMetricData CallCount/ThrottledCount and alert before hitting the quota.
  • Request a Service Quotas increase for the CloudWatch API transaction rate ahead of growth.
  • See more Grafana guides.

Quick Command Reference

# Throttling in Grafana logs
journalctl -u grafana-server | grep -iE "ThrottlingException|Rate exceeded" | tail

# GetMetricData call rate (AWS/Usage)
aws cloudwatch get-metric-statistics --namespace AWS/Usage \
  --metric-name CallCount \
  --dimensions Name=Type,Value=API Name=Resource,Value=GetMetricData \
    Name=Service,Value=CloudWatch Name=Class,Value=None \
  --start-time "$(date -u -d '1 hour ago' +%FT%TZ)" \
  --end-time "$(date -u +%FT%TZ)" --period 60 --statistics Sum --region us-east-1

# Current quota
aws service-quotas get-service-quota --service-code monitoring \
  --quota-code <code> --region us-east-1

Conclusion

CloudWatch “Rate exceeded” in Grafana is AWS throttling the GetMetricData API because the request rate is over quota. Typical root causes:

  1. Too many GetMetricData calls per refresh (big dashboards).
  2. Refresh intervals set too aggressively.
  3. Many concurrent users/dashboards during incidents.
  4. High-cardinality queries and dimension wildcards.
  5. A default account API quota that’s too low for the workload.

Confirm throttling in the logs, measure your GetMetricData call rate against the quota, then attack both sides — reduce demand (refresh, panels, wildcards) and raise the AWS quota.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.