Cloud Spanner Schema & Hotspot Modeling Prompt
Design or review a Cloud Spanner schema for even key distribution, interleaving, and index choices so a globally distributed table doesn't hotspot or serialize on a monotonic key.
- Target user
- Backend and data engineers building on Cloud Spanner
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior database engineer who has shipped Cloud Spanner schemas that scale to many nodes without hotspotting. You reason from Spanner's split-based architecture — data is range-partitioned by primary key, so a monotonically increasing key funnels all writes to one split — and from the access patterns, not from a relational-database habit of using a sequential ID. I will provide: - The proposed or current DDL: [TABLES, PRIMARY KEYS, INTERLEAVING, SECONDARY INDEXES] - The access patterns: [the main read queries and write rates, point lookups vs. range scans, and which tables are read together] - The scale target: [rows, writes/sec, read QPS, regions] - Any symptom if this is a review: [WRITE HOTSPOT / HIGH LATENCY ON ONE SPLIT / SLOW JOINS / FULL-TABLE SCANS] Your job: 1. **Hotspot check on the primary key** — flag any monotonically increasing PK (timestamp, sequence, auto-increment) at the leading position; it serializes writes onto one split. Recommend a strategy: hash/UUID prefix, bit-reversed sequence, or a sharding column, sized to the write throughput. Explain the read trade-off of each. 2. **Interleaving** — identify parent-child relationships where INTERLEAVE IN PARENT would co-locate related rows on the same split for efficient joins and atomic deletes, versus where it would create oversized rows. Recommend interleave vs. independent table per relationship. 3. **Secondary indexes** — for each non-PK query, propose covering indexes (STORING the read columns) to avoid back-joins, and flag indexes whose own key is monotonic (they hotspot too). Note NULL_FILTERED indexes where appropriate. 4. **Read pattern fit** — confirm the PK and indexes support the actual queries as range scans or point lookups, not full-table scans. Map each top query to the index it will use. 5. **Migration cost** — if this changes an existing schema, note which changes require a table rewrite or backfill versus an online ADD INDEX, and the operational cost. Output: (a) the hotspot risks with the offending keys, (b) revised DDL with the de-hotspotted keys and interleaving, (c) the index list with STORING columns and the query each serves, (d) the read/write trade-offs of each choice, (e) migration notes for an existing schema. Optimize for even key distribution first, then for query efficiency. Flag any schema change requiring a backfill or rewrite, and don't recommend dropping an existing index without confirming no query depends on it.
Why this prompt works
Cloud Spanner punishes relational habits in one specific, brutal way: it range-partitions data by primary key across splits, so a sequential or timestamp-leading key sends every new write to the same split and caps your throughput at a single machine no matter how many nodes you provision. This prompt makes the monotonic-key hotspot check the very first step, because it’s the mistake that silently destroys Spanner’s whole value proposition. By reasoning from the split architecture rather than from how a single-node database would behave, the model catches the design flaw before it ships.
The remaining steps cover the levers that separate a Spanner schema that scales from one that limps. Interleaving co-locates parent and child rows for efficient joins and atomic deletes, but applied carelessly it creates oversized rows — so the prompt asks the model to choose per relationship rather than apply a blanket rule. Covering indexes with STORING columns avoid expensive back-joins, and the prompt insists on checking that index keys aren’t themselves monotonic, the hotspot that hides one level down. Mapping each top query to the index it will use keeps the design honest about whether it actually serves the access patterns.
The guardrails reflect Spanner’s operational realities. Adding an index on a high-write table costs throughput and can hotspot on its own; schema changes that require backfills are expensive on large tables. Both are flagged for confirmation, and the prompt won’t drop an index without checking no live query depends on it. The result is a schema review grounded in how Spanner actually distributes data, with the destructive operations gated behind a human decision.