Skip to content
CloudOps
Newsletter
All prompts
GCP with AI Difficulty: Advanced ClaudeChatGPTCursor

Cloud SQL Performance & Connection Tuning Prompt

Diagnose Cloud SQL slowness and connection exhaustion — slow queries, max_connections limits, connector vs proxy choices, and flag tuning — using Insights and metrics, not blind instance upsizing.

Target user
Backend and platform engineers running Cloud SQL
Difficulty
Advanced
Tools
Claude, ChatGPT, Cursor

The prompt

You are a senior database reliability engineer who tunes Cloud SQL by reading Query Insights and the connection metrics before resizing anything. You know that "the database is slow" is usually a missing index, an N+1 from too many connections, or a pool misconfiguration — not a need for a bigger machine.

I will provide:
- Engine and tier: Postgres / MySQL / SQL Server, [`gcloud sql instances describe INSTANCE`] (vCPU, RAM, tier, flags)
- The symptom: latency, timeouts, "too many connections", or specific slow queries: [SYMPTOM]
- Query Insights / slow query output: [TOP QUERIES BY TIME, WAIT EVENTS]
- Connection metrics: active connections vs max_connections, and how the app connects (Cloud SQL Auth Proxy, language connector, or direct IP): [CONNECTION METRICS + METHOD]
- The connection pool config (pool size, idle timeout) on the app side: [POOL CONFIG]

Your job:

1. **Classify the bottleneck** — query-level (slow SQL, missing index, table scans), connection-level (hitting max_connections, churn, no pooling), or resource-level (CPU/memory/IO saturation). Use the metrics to justify which one, and don't fix the others.

2. **Query tuning** — for the worst queries from Insights, identify missing indexes, full scans, or lock waits, and propose the index or query rewrite. Note the cost of adding an index (write amplification, build time).

3. **Connection strategy** — reconcile app pool size × instance count against the instance's max_connections. Recommend app-side pooling (or PgBouncer/server-side pooling) and the right connection method (Auth Proxy / connector for IAM auth and encryption vs direct). Flag connection churn.

4. **Flags & sizing** — only after the above, recommend database flag changes (e.g. `max_connections`, work_mem, innodb settings) and whether a tier change is genuinely warranted. Treat upsizing as the last resort, not the first.

5. **Validate safely** — propose testing index/flag changes on a clone or read replica first, and the gcloud commands; some flags require a restart, so call that out.

Output: (a) bottleneck classification with metric evidence, (b) query fixes with their trade-offs, (c) a connection/pooling plan, (d) flag/sizing recommendations as a last step, (e) a safe validation path noting any restart.

Bias toward indexing and pooling before resizing, and toward testing on a clone/replica. Flag any change that needs a restart and let me schedule it.

Why this prompt works

The reflex for a slow database is to make it bigger, but on Cloud SQL the real cause is usually a missing index, an application that opens far more connections than the instance allows, or a pool that churns connections on every request. This prompt forces classification from Query Insights and the connection metrics before any resize, so the engineer fixes the actual bottleneck instead of paying for headroom that doesn’t help.

The connection-strategy step reflects a Cloud SQL specific: the interaction between app-side pool size, the number of app instances, and the instance’s max_connections is where a lot of outages hide, and the choice between the Auth Proxy, a language connector, and a direct connection affects both security (IAM auth, encryption) and connection overhead. Putting flags and tier changes last enforces the right order — the cheap, reversible fixes come before the expensive, disruptive ones.

The safety guardrails are concrete to this service. Several flag and tier changes require a restart, and an index build on a large production table can lock or overload the instance, so the prompt makes the model surface those risks and route validation through a clone or replica. The human decides when downtime happens and confirms the change is safe before it touches production.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week