RDS Aurora Slow Query and Parameter Tuning Prompt
Use Performance Insights, slow-query logs, and EXPLAIN output to find the dominant wait and propose index, query, and parameter-group changes for RDS or Aurora.
- Target user
- Database and backend engineers tuning AWS RDS and Aurora performance
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior database reliability engineer. You tune RDS/Aurora by finding the dominant wait event in Performance Insights first, then attacking the specific query or contention causing it — never by guessing at parameters. I will provide: - The engine and version (MySQL/PostgreSQL/Aurora) and instance class: [ENGINE_AND_CLASS] - Performance Insights top SQL and top wait events: [PERF_INSIGHTS] - Slow-query log entries and the EXPLAIN / EXPLAIN ANALYZE for the worst query: [SLOW_QUERY_AND_EXPLAIN] - Current relevant parameter-group values and the symptom: [PARAMS_AND_SYMPTOM] Do the following, numbered: 1. Identify the dominant load: which wait event tops Performance Insights (CPU, IO read, lock/row-lock, commit/log flush)? State what that class of wait implies about the bottleneck. 2. Read the EXPLAIN output and call out the cost drivers: full table scans, missing or unused indexes, large row estimates vs actual (stale statistics), sorts/temp tables spilling to disk, and nested-loop joins over big sets. Quote the plan node. 3. Propose query-level fixes first (the highest leverage): a covering or composite index with the exact column order and why, a query rewrite, or an `ANALYZE`/statistics refresh — preferring these over hardware. 4. Only then propose parameter-group changes that target the measured wait (e.g. buffer pool / shared_buffers sizing for IO waits, `max_connections` and a pooler for connection storms, log/commit settings for commit waits), noting which require a reboot and which are dynamic. Output as: (a) the dominant wait and what it implies, (b) the EXPLAIN cost drivers with the quoted plan node, (c) ranked fixes — query/index first, parameters second — each with expected effect and how to verify, (d) the single safest change to try first. Test index and parameter changes on a restored snapshot or read replica first. Never add an index directly on a busy production primary without considering the lock/online-DDL implications; never change a parameter group on prod without knowing if it forces a reboot.
Why this prompt works
Database tuning goes wrong when engineers reach for the parameter group or a bigger instance class before they know what the database is actually waiting on. RDS and Aurora expose exactly this signal through Performance Insights: the dominant wait event tells you whether you are CPU-bound, IO-bound, lock-bound, or stalled on commit, and each of those points at a completely different remedy. This prompt anchors the whole analysis to the top wait first, so the fix matches the measured bottleneck rather than a hunch.
Reading the EXPLAIN plan against that wait is where the real diagnosis happens. A full table scan, a row-estimate that is wildly off from the actual count, or a sort spilling to a temp table each map to a concrete cause — a missing composite index, stale statistics, or an under-sized work buffer. By asking the model to quote the specific plan node and propose a covering or composite index with explicit column order, the prompt produces a fix you can reason about, not a vague “add an index” suggestion that may never be used by the planner.
The ordering of remedies — query and index changes before parameters before hardware — reflects where the leverage actually is, and the guardrails keep changes safe. Indexing a large, busy table can lock writes if done naively, and static parameters silently require a reboot to apply. By forcing the model to flag online-DDL paths, reboot-requiring parameters, and to validate on a snapshot or replica first, the prompt keeps the engineer in control of a production database rather than gambling on it.