Skip to content
DevOps AI ToolKit
Newsletter
All guides
GCP with AI By James Joyner IV · · 11 min read

Modeling Cloud Spanner Schemas With AI: Hotspots and Interleaving

Cloud Spanner punishes sequential keys with write hotspots that cap your throughput. Here's how I use AI to design keys, interleaving, and indexes that actually scale.

  • #gcp
  • #ai
  • #cloud-spanner
  • #database
  • #schema-design

A team migrated a service to Cloud Spanner expecting it to scale horizontally, provisioned a generous number of nodes, and then watched write throughput flatten at roughly what a single machine could do. The schema used an auto-incrementing primary key, carried over from the Postgres design it replaced. In Spanner, that’s the cardinal sin: the database range-partitions data by primary key across units called splits, so a monotonically increasing key sends every new write to the same split — the one holding the current high end of the range. All those nodes sat idle while one split took the entire write load. Spanner rewards a specific way of thinking about keys and punishes relational habits, and AI is a good thinking partner here precisely because the rules are concrete even when they’re counterintuitive.

The hotspot check comes first

Before anything else — before indexes, before interleaving — I check the primary key for monotonicity, because a hotspotted PK makes every other optimization irrelevant.

Prompt: “Review this Cloud Spanner DDL for write hotspots. Spanner range-partitions by primary key across splits, so a monotonically increasing leading key column (timestamp, sequence, auto-increment) funnels all writes to one split. Flag any such key, and for each, recommend a de-hotspotting strategy — hashed/UUID prefix, bit-reversed sequence, or a sharding column — with the read trade-off of each.”

The trade-off is the part people miss. Hashing the key spreads writes perfectly but destroys range-scan locality, so if your main read pattern is “give me the last hour of events in order,” a pure hash makes that scan expensive. A bit-reversed sequence or a bounded shard column can balance write distribution against read locality. The model lays out the options; I pick based on whether the workload is read-heavy on ranges or write-heavy on inserts.

-- Hotspot risk: every insert lands on the latest split
CREATE TABLE Events (
  EventId INT64 NOT NULL,  -- auto-incremented application-side
  ...
) PRIMARY KEY (EventId);

-- Spread writes with a shard prefix, keep time-range locality within a shard
CREATE TABLE Events (
  ShardId  INT64 NOT NULL,   -- hash(EventId) % N
  EventTs  TIMESTAMP NOT NULL,
  EventId  INT64 NOT NULL,
  ...
) PRIMARY KEY (ShardId, EventTs, EventId);

Interleaving: co-locate what you read together

Interleaving physically stores child rows alongside their parent on the same split. Done right, it makes parent-child joins cheap and lets you delete a parent and its children atomically. Done wrong, it bloats a parent row with more children than a split wants to hold. The decision is per relationship, not a blanket policy.

Prompt: “Here are two parent-child relationships in my schema with their access patterns and rough child-row counts per parent. For each, recommend INTERLEAVE IN PARENT versus an independent table. Interleave when we read parent and children together and the child count per parent is bounded; independent when child counts are unbounded or the tables are queried separately. Explain the reasoning for each.”

A CustomersOrders relationship where a customer has a bounded, jointly-read set of recent orders is a good interleave. A CustomersEventLog relationship where events grow without bound is not — that wants its own table.

Indexes that don’t back-join or hotspot

Every query that doesn’t filter on the primary key needs a secondary index, and the two things to get right are covering the read columns and not re-introducing a hotspot in the index key itself.

Prompt: “For each of these top queries, propose a secondary index. Use STORING to cover the columns the query reads so Spanner doesn’t back-join to the base table. Flag any index whose leading key column is monotonic — it hotspots writes the same way a monotonic PK does. Map each query to the index that will serve it.”

The back-join point is real money: without STORING the needed columns, Spanner reads the index to find the keys and then reads the base table again to get the values, doubling the work. And an index keyed on a timestamp hotspots writes exactly like a monotonic PK does — the hotspot just moved one level down where it’s easier to miss.

Migration cost is a first-class concern

When this is a review of an existing schema rather than a greenfield design, the question isn’t only “what’s the right shape” but “what does it cost to get there.” Some changes are online; others rewrite the table.

Prompt: “I want to change this existing Spanner table’s primary key to de-hotspot it. Explain why a PK change requires creating a new table and backfilling rather than an in-place alter, outline the migration approach (new table, dual-write, backfill, cutover), and flag the operational cost on a large table.”

A primary key is part of the physical layout, so changing it means a new table and a backfill — there’s no in-place alter. The model is good at laying out the dual-write-and-backfill dance; I’m the one who decides whether the migration window is acceptable.

The honest division of labor

AI is strong at the mechanical reasoning Spanner demands: spotting a monotonic key, choosing covering columns for an index, deciding interleave-versus-independent from access patterns. Those follow from how Spanner distributes data, which is well-documented, so the model is reliable on them. What it can’t see is your real write distribution at scale or whether a backfill window is operationally acceptable — so it drafts the schema and I validate the de-hotspotting against real load before trusting it.

The rule I hold to: optimize for even key distribution first, and never drop an index or run a rewrite without confirming what depends on it. The reusable prompts live in my prompts library, and the GCP with AI series covers the other managed databases too, including Cloud SQL performance tuning for when a relational engine is the right fit instead. Spanner scales beautifully once your keys let it.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.