Grafana Error Guide: CloudWatch 'Rate exceeded'

Overview

Grafana’s CloudWatch data source calls the AWS GetMetricData (and ListMetrics) APIs, which are rate-limited per account per region. When too many queries fire at once — many panels, short refresh intervals, many dashboards, or many concurrent users — AWS returns a ThrottlingException: Rate exceeded and the affected panels show no data. It’s a client-side/AWS-quota problem, not a Grafana bug.

The literal errors you will see:

ThrottlingException: Rate exceeded

CloudWatch error: Throttling: Rate exceeded status code: 400

logger=tsdb.cloudwatch error="failed to execute query" err="ThrottlingException: Rate exceeded"

It occurs on dashboard load/refresh, worse during incidents when everyone opens CloudWatch dashboards at once.

Symptoms

CloudWatch panels intermittently show “Rate exceeded” / no data, then recover.
More panels fail as dashboards get larger or refresh faster.
Errors spike when many users load CloudWatch dashboards simultaneously.
AWS ThrottledCount/CallCount metrics for GetMetricData climb.

journalctl -u grafana-server --no-pager | grep -i "ThrottlingException\|Rate exceeded" | tail

logger=tsdb.cloudwatch error="ThrottlingException: Rate exceeded"

Common Root Causes

1. Too many GetMetricData calls per refresh

Each panel/query maps to GetMetricData work. Big dashboards with many panels and short refresh intervals blow through the per-second transactions-per-second (TPS) limit.

2. Refresh interval too aggressive

Dashboards set to refresh every 5–10s multiply the call rate; a heavy CloudWatch dashboard on a fast refresh throttles quickly.

3. Many concurrent users / dashboards

During an incident, dozens of people open the same CloudWatch dashboards, and combined they exceed the account’s regional API quota.

4. High-cardinality queries / wildcards

Broad dimension wildcards or * on ListMetrics fan out into many metrics and many data-point requests.

5. Default account API limit too low

The account’s CloudWatch API request-rate quota hasn’t been raised for the monitoring workload; the ceiling is simply too low.

Diagnostic Workflow

Step 1: Confirm throttling in Grafana logs

journalctl -u grafana-server --no-pager | grep -iE "ThrottlingException|Rate exceeded" | tail -20
docker logs grafana 2>&1 | grep -iE "Throttling|Rate exceeded" | tail

Step 2: Measure the API call rate from AWS side

aws cloudwatch get-metric-statistics \
  --namespace AWS/Usage \
  --metric-name CallCount \
  --dimensions Name=Type,Value=API Name=Resource,Value=GetMetricData Name=Service,Value=CloudWatch Name=Class,Value=None \
  --start-time "$(date -u -d '1 hour ago' +%FT%TZ)" \
  --end-time "$(date -u +%FT%TZ)" \
  --period 60 --statistics Sum --region us-east-1

Compare the per-minute Sum against your account’s GetMetricData quota.

Step 3: Check the data source config

# Grafana provisioned CloudWatch data source
apiVersion: 1
datasources:
  - name: CloudWatch
    type: cloudwatch
    jsonData:
      authType: default        # or keys / arn
      defaultRegion: us-east-1

Step 4: Reduce demand on dashboards

Raise dashboard refresh intervals (e.g. 1m instead of 5s).
Split huge dashboards; reduce panels per dashboard.
Replace dimension wildcards with explicit dimensions.
Increase the min interval so fewer data points are requested.

Step 5: Request an AWS quota increase

aws service-quotas get-service-quota \
  --service-code monitoring \
  --quota-code L-yyyyyyyy --region us-east-1
# then request-service-quota-increase for the GetMetricData transaction rate

Example Root Cause Analysis

During an outage, the primary CloudWatch dashboard shows scattered “Rate exceeded” panels while 30 responders have it open. Grafana log:

logger=tsdb.cloudwatch error="ThrottlingException: Rate exceeded status code: 400"

The dashboard has 40 CloudWatch panels and a 10s refresh. With 30 concurrent viewers that’s a very high GetMetricData rate. The AWS CallCount for GetMetricData confirms the account is at its regional TPS quota:

GetMetricData CallCount (per min): 48200  # over quota during the incident

Two fixes applied together:

Reduce demand — raise the dashboard refresh to 1m, split the 40 panels into two focused dashboards, and set a sensible min interval.
Raise the ceiling — request a GetMetricData transaction-rate quota increase for the region.

After the refresh change the throttling stops immediately (call rate drops ~6x), and the quota increase gives headroom for future incidents. Root cause: aggressive refresh × many panels × many concurrent viewers exceeding the account’s GetMetricData rate — not a Grafana defect.

Prevention Best Practices

Use conservative dashboard refresh intervals (1m+) for CloudWatch; avoid 5–10s on heavy dashboards.
Keep CloudWatch dashboards lean: fewer panels, explicit dimensions instead of wildcards, sensible min interval.
Consider metric streams / exporting CloudWatch metrics to Prometheus for high-traffic dashboards to offload the API.
Monitor your own GetMetricData CallCount/ThrottledCount and alert before hitting the quota.
Request a Service Quotas increase for the CloudWatch API transaction rate ahead of growth.
See more Grafana guides.

Quick Command Reference

# Throttling in Grafana logs
journalctl -u grafana-server | grep -iE "ThrottlingException|Rate exceeded" | tail

# GetMetricData call rate (AWS/Usage)
aws cloudwatch get-metric-statistics --namespace AWS/Usage \
  --metric-name CallCount \
  --dimensions Name=Type,Value=API Name=Resource,Value=GetMetricData \
    Name=Service,Value=CloudWatch Name=Class,Value=None \
  --start-time "$(date -u -d '1 hour ago' +%FT%TZ)" \
  --end-time "$(date -u +%FT%TZ)" --period 60 --statistics Sum --region us-east-1

# Current quota
aws service-quotas get-service-quota --service-code monitoring \
  --quota-code <code> --region us-east-1

Conclusion

CloudWatch “Rate exceeded” in Grafana is AWS throttling the GetMetricData API because the request rate is over quota. Typical root causes:

Too many GetMetricData calls per refresh (big dashboards).
Refresh intervals set too aggressively.
Many concurrent users/dashboards during incidents.
High-cardinality queries and dimension wildcards.
A default account API quota that’s too low for the workload.

Confirm throttling in the logs, measure your GetMetricData call rate against the quota, then attack both sides — reduce demand (refresh, panels, wildcards) and raise the AWS quota.

Grafana Error Guide: CloudWatch 'Rate exceeded' — Throttling the Data Source