Grafana CloudWatch Data Source Design Prompt
Design the Grafana CloudWatch data source for metrics, Logs Insights, and cross-account observability with least-privilege IAM.
- Target user
- SREs querying AWS from Grafana
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior observability engineer who designs the Grafana CloudWatch data source for metrics, Logs Insights, and cross-account. I will provide: - The AWS accounts, regions, and namespaces to query - The auth model available (IAM role, keys, IRSA) - Current data source config Your job: 1. **Choose the auth provider**: - `default`/SDK chain, `keys`, or `arn` (assume role) - On EKS prefer IRSA; on EC2 use the instance role - Never embed long-lived keys if a role is available 2. **Scope IAM least-privilege**: - `cloudwatch:GetMetricData`, `ListMetrics`, `GetMetricStatistics` - `logs:StartQuery`, `GetQueryResults`, `DescribeLogGroups` for Insights - `tag:GetResources` for dimension tag queries 3. **Configure the data source**: - `authType`, `defaultRegion`, optional `assumeRoleArn`, `externalId` - Set namespaces and custom metrics namespaces 4. **Design metric queries**: - Use the Metrics Query Editor or Metrics Insights (SQL-like) - Set the right statistic and period; mind CloudWatch API cost 5. **Design Logs Insights queries**: - Query with the Logs Insights dialect; `stats`, `filter`, `parse` - Watch the concurrent query and scan limits 6. **Enable cross-account observability**: - Use CloudWatch cross-account with a monitoring account - Set `assumeRoleArn` per target or use OAM sink/links 7. **Control cost and rate**: - GetMetricData is billed per metric; batch and cache - Set alert eval intervals mindful of API throttling Mark DESTRUCTIVE: broad IAM grants (over-permission), high-frequency queries inflating AWS bill, editing the shared data source affecting all dashboards. --- Accounts/regions/namespaces: [DESCRIBE] Auth model: [DESCRIBE] Current config: [DESCRIBE]
Why this prompt works
The CloudWatch data source has two traps: cost (GetMetricData bills per metric) and auth (long-lived keys vs roles). This prompt picks a role-based auth model with least-privilege IAM, structures metrics and Logs Insights queries within AWS limits, and adds cross-account observability — so dashboards work without a surprise bill or an over-scoped policy.
How to use it
- Pick role/IRSA auth, avoid static keys.
- Scope IAM to the exact CloudWatch/Logs actions.
- Design queries mindful of period and API cost.
- Enable cross-account via assume-role or OAM.
Useful commands
# Provision the data source (see Example config), then verify health
curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
http://grafana:3000/api/datasources/name/CloudWatch/health | jq
# Validate the assume-role works from the Grafana host
aws sts assume-role --role-arn "$ASSUME_ROLE_ARN" \
--role-session-name grafana --external-id "$EXTERNAL_ID" | jq '.Credentials.Expiration'
# List metrics in a namespace to confirm access
aws cloudwatch list-metrics --namespace AWS/RDS --region us-east-1 | jq '.Metrics | length'
# Test a Logs Insights query start
aws logs start-query --log-group-name /aws/lambda/api \
--start-time $(date -d '1 hour ago' +%s) --end-time $(date +%s) \
--query-string 'fields @timestamp, @message | limit 20'
Example config
# provisioning/datasources/cloudwatch.yaml
apiVersion: 1
datasources:
- name: CloudWatch
type: cloudwatch
uid: cloudwatch-uid
jsonData:
authType: arn
defaultRegion: us-east-1
assumeRoleArn: arn:aws:iam::111122223333:role/grafana-cloudwatch
externalId: grafana-prod
customMetricsNamespaces: MyApp/Backend
// Least-privilege IAM policy for the assumed role
{
"Version": "2012-10-17",
"Statement": [
{ "Effect": "Allow", "Action": [
"cloudwatch:GetMetricData", "cloudwatch:ListMetrics",
"cloudwatch:GetMetricStatistics", "tag:GetResources"
], "Resource": "*" },
{ "Effect": "Allow", "Action": [
"logs:StartQuery", "logs:StopQuery",
"logs:GetQueryResults", "logs:DescribeLogGroups"
], "Resource": "*" }
]
}
Common findings this catches
- Surprise AWS bill → high-frequency GetMetricData on many metrics.
- Auth failures → static keys expired or assume-role externalId wrong.
- Over-permission →
cloudwatch:*/logs:*wildcards. - Throttled Logs → exceeding Insights concurrent-query limits.
- No cross-account data → OAM/assume-role not configured.
- Shared-source breakage → editing the data source without change control.
When to escalate
- IAM policy and cross-account trust — AWS/security team.
- Cost review of CloudWatch API usage — FinOps.
- Migrating heavy CloudWatch queries to a metrics store — architecture.
Related prompts
-
Grafana Azure Monitor Data Source Design Prompt
Design a Grafana Azure Monitor data source covering metrics, Log Analytics (KQL), and Resource Graph queries with least-privilege auth.
-
Grafana Data Source Provisioning YAML Prompt
Provision Grafana data sources as code with provisioning YAML in /etc/grafana/provisioning/datasources for reproducible, secret-safe config.
-
Grafana Query Caching Enterprise Prompt
Configure Grafana Enterprise query caching to cut data source load and speed dashboards, with per-data-source TTLs and Redis backend.