Skip to content
DevOps AI ToolKit
Newsletter
All guides
Azure with AI By James Joyner IV · · 11 min read

Cosmos DB Data Modeling With AI: The Partition Key Is the Whole Game

Cosmos DB punishes a bad partition key with hot partitions and runaway RU cost — and you can't change it later. Here's how AI helps you model from access patterns, not a relational schema.

  • #azure
  • #ai
  • #cosmos-db
  • #nosql
  • #data-modeling

The Cosmos DB bill doubled overnight and nobody had changed anything. Except they had: a new feature added a query that filtered on a field that wasn’t the partition key, so every call fanned out across all physical partitions, and the request-unit charge multiplied. The partition key had been chosen months earlier by analogy to the table’s primary key in the old relational database — which is the single most common way Cosmos DB modeling goes wrong. And because you can’t change a partition key on an existing container, fixing it meant a full migration. In Cosmos, the partition key is the decision everything hangs on, and it’s the one most likely to be made carelessly.

Cosmos rewards modeling from access patterns and punishes modeling from entities. The relational instinct — normalize, then add indexes — produces hot partitions and expensive cross-partition queries. AI is genuinely useful here because it can reason from the reads and writes you actually perform to a key that spreads load and keeps the hot queries single-partition. But it has to be pushed to start from access patterns, and it has to be reminded that the partition key is permanent.

Lead with access patterns, not the schema

The right input to a Cosmos model isn’t a list of entities — it’s a list of the actual queries and writes, their frequency, and which ones are latency-critical. The partition key then has to satisfy two things at once: spread writes and storage evenly so no single partition gets hot, and let the highest-frequency reads hit a single partition so they stay cheap.

Prompt: “I’m modeling orders in Cosmos DB. Access patterns: (1) read all orders for a customer — very frequent, latency-critical; (2) read a single order by ID — frequent; (3) report total revenue per day — infrequent, batch. Item count is in the tens of millions. Recommend a partition key that keeps pattern 1 single-partition and spreads writes evenly, and explain why each candidate key works or fails.”

A good answer reasons about cardinality and skew explicitly: customerId keeps the frequent per-customer read single-partition and spreads reasonably if customers are numerous and balanced, while a low-cardinality key like region would pack everyone into a handful of hot partitions, and a monotonic key like orderDate would hotspot every write onto today’s partition. That trade-off reasoning is exactly what AI does well — and exactly what the relational instinct skips. This access-pattern-first thinking runs through the broader Azure data work.

Embed what’s read together, reference what grows

Cosmos gives you a choice the relational world doesn’t: embed related data in one document or reference it across documents. Embed data that’s read together and bounded in size — an order and its line items. Reference data that’s large, unbounded, or updated independently — a customer’s entire order history doesn’t belong inside the customer document. The trap is embedding something that grows toward the 2 MB item limit, or that forces a rewrite of a huge document on every tiny change.

Prompt: “For my orders model, should I embed line items inside the order document or reference them in a separate container? Line items are read together with the order, there are usually fewer than 50 per order, and they never change after the order is placed. Explain the embed-vs-reference trade-off for this case and flag any 2 MB item-size risk.”

Bounded and read-together and immutable is the textbook case for embedding, and AI should say so — while flagging that an order with thousands of line items would push toward referencing. The prompts library has the full modeling prompt that walks this decision for every entity.

Tune the indexing policy for write-heavy containers

By default Cosmos indexes every property, which is convenient for ad-hoc queries and expensive for writes — every indexed path costs RU on every write. On a write-heavy container, excluding paths you never filter on is real money saved.

{
  "indexingMode": "consistent",
  "includedPaths": [{ "path": "/customerId/?" }, { "path": "/status/?" }],
  "excludedPaths": [{ "path": "/*" }]
}

Prompt: “My orders container is write-heavy and I only ever filter on customerId and status. Write an indexing policy that indexes just those paths and excludes everything else, and explain how much write-RU pressure the default index-everything policy was adding.”

AI drafts the policy; you confirm the included paths cover every query you actually run, because excluding a path you later filter on silently makes that query expensive.

The partition key is permanent — model the migration

Here’s the constraint that makes all of this high-stakes: you cannot change the partition key on an existing Cosmos container. Fixing a bad key means creating a new container with the right key and migrating the data. So any key-change advice has to come with a real migration path, not an in-place edit that doesn’t exist.

Prompt: “I need to change my Cosmos container’s partition key from orderId to customerId to fix cross-partition fan-out. The container has 20 million items and the app is live. Give me a safe migration plan — new container, dual-write or backfill, cutover, verification — and how to roll back if the new model underperforms.”

The plan should be: stand up the new container, backfill historical data (or dual-write from the app), verify the new model performs, cut reads over, then retire the old container. AI sequences this correctly; you own the cutover and verification. Validating the new key against real cardinality and access patterns before committing is the whole point — get it wrong twice and you’ve migrated twenty million items for nothing.

The discipline

Cosmos modeling is partition-key modeling. Start from access patterns and pick a key that spreads load and keeps the hot reads single-partition. Embed what’s bounded and read together; reference what grows. Trim the indexing policy on write-heavy containers. And treat the partition key as permanent — validate it before you commit, and plan a real migration if it has to change. AI reasons about cardinality, skew, and RU cost from your real workload; you verify the key against the access patterns and own the irreversible decision. Do that and the bill stops doubling overnight. There’s more data material in the Azure category, and the Cosmos modeling prompt is ready to copy from the prompts library.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.