You are a senior database reliability engineer. You tune RDS/Aurora by finding the dominant wait event in Performance Insights first, then attacking the specific query or contention causing it — never by guessing at parameters. I will provide: - The engine and version (MySQL/PostgreSQL/Aurora) and instance class: [ENGINE_AND_CLASS] - Performance Insights top SQL and top wait events: [PERF_INSIGHTS] - Slow-query log entries and the EXPLAIN / EXPLAIN ANALYZE for the worst query: [SLOW_QUERY_AND_EXPLAIN] - Current relevant parameter-group values and the symptom: [PARAMS_AND_SYMPTOM] Do the following, numbered: 1. Identify the dominant load: which wait event tops Performance Insights (CPU, IO read, lock/row-lock, commit/log flush)? State what that class of wait implies about the bottleneck. 2. Read the EXPLAIN output and call out the cost drivers: full table scans, missing or unused indexes, large row estimates vs actual (stale statistics), sorts/temp tables spilling to disk, and nested-loop joins over big sets. Quote the plan node. 3. Propose query-level fixes first (the highest leverage): a covering or composite index with the exact column order and why, a query rewrite, or an `ANALYZE`/statistics refresh — preferring these over hardware. 4. Only then propose parameter-group changes that target the measured wait (e.g. buffer pool / shared_buffers sizing for IO waits, `max_connections` and a pooler for connection storms, log/commit settings for commit waits), noting which require a reboot and which are dynamic. Output as: (a) the dominant wait and what it implies, (b) the EXPLAIN cost drivers with the quoted plan node, (c) ranked fixes — query/index first, parameters second — each with expected effect and how to verify, (d) the single safest change to try first. Test index and parameter changes on a restored snapshot or read replica first. Never add an index directly on a busy production primary without considering the lock/online-DDL implications; never change a parameter group on prod without knowing if it forces a reboot.

Why this prompt works

Database tuning goes wrong when engineers reach for the parameter group or a bigger instance class before they know what the database is actually waiting on. RDS and Aurora expose exactly this signal through Performance Insights: the dominant wait event tells you whether you are CPU-bound, IO-bound, lock-bound, or stalled on commit, and each of those points at a completely different remedy. This prompt anchors the whole analysis to the top wait first, so the fix matches the measured bottleneck rather than a hunch.

Reading the EXPLAIN plan against that wait is where the real diagnosis happens. A full table scan, a row-estimate that is wildly off from the actual count, or a sort spilling to a temp table each map to a concrete cause — a missing composite index, stale statistics, or an under-sized work buffer. By asking the model to quote the specific plan node and propose a covering or composite index with explicit column order, the prompt produces a fix you can reason about, not a vague “add an index” suggestion that may never be used by the planner.

The ordering of remedies — query and index changes before parameters before hardware — reflects where the leverage actually is, and the guardrails keep changes safe. Indexing a large, busy table can lock writes if done naively, and static parameters silently require a reboot to apply. By forcing the model to flag online-DDL paths, reboot-requiring parameters, and to validate on a snapshot or replica first, the prompt keeps the engineer in control of a production database rather than gambling on it.

RDS Aurora Slow Query and Parameter Tuning Prompt

Why this prompt works

Related prompts

CloudWatch Logs Insights and Alarm Design Prompt

Why this prompt works

Related prompts

CloudWatch Logs Insights and Alarm Design Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet