Designing JSONB Columns in Postgres With AI

JSONB is one of the best and most misused features in Postgres. It lets you store semi-structured data with real indexing and query support, which is genuinely powerful for things like event payloads, third-party API responses, and flexible product attributes. It is also the easiest way to turn a relational database into a slow document store, one SELECT * at a time, when teams reach for it to avoid the discipline of schema design. The difference between JSONB as a sharp tool and JSONB as technical debt comes down to a handful of decisions made when you create the column. AI is a strong reviewer here: give it a proposed schema and a description of your access patterns, and it will spot the columns that should never have been JSONB and the indexes you forgot. You still validate against real query plans.

This is how I design JSONB columns that stay fast.

First Decide: JSONB, Normalized, or Hybrid

The first question is not “how do I index JSONB” but “should this be JSONB at all.” Three patterns cover almost everything.

Use normalized columns for any attribute you filter, join, sort, or aggregate on regularly, or that has a stable shape. A customer’s email, an order’s status, a price — these belong in typed columns with constraints. Putting them in JSONB throws away type checking, foreign keys, and the planner’s best statistics.

Use JSONB for data that is genuinely variable in shape, sparse, or owned by an external system whose schema you do not control — webhook payloads, audit detail blobs, per-tenant custom fields.

Use a hybrid for the common middle ground: promote the few keys you query into real columns and keep the long tail in a JSONB column. This is the pattern most production tables actually want.

CREATE TABLE events (
  id          bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  occurred_at timestamptz NOT NULL,
  event_type  text NOT NULL,          -- promoted: always filtered
  user_id     bigint,                 -- promoted: always joined
  payload     jsonb NOT NULL          -- the variable remainder
);

The promoted columns get normal btree indexes and accurate statistics; the JSONB holds whatever the event happened to carry. If you find yourself reaching into payload for the same key in every query, that key wants to be promoted too.

Prompt: “Review this proposed table. We store user profiles with id, created_at, and a single data jsonb holding email, country, subscription_tier, and a free-form preferences object. We filter on email and subscription_tier constantly. What would you change?”

Output (abridged): “Pull email, country, and subscription_tier out of data into typed columns. You filter on email and subscription_tier constantly, so they need btree indexes and real statistics — inside JSONB the planner estimates poorly and you lose a unique constraint on email. Keep only preferences in JSONB, since it’s genuinely free-form. Add a GIN index on preferences with jsonb_path_ops if you query it by containment; otherwise leave it unindexed. This is the hybrid pattern, and it will be both faster and safer than the all-JSONB design.”

That review is exactly the call I would make, and it shows where AI earns its place: it caught the missing unique constraint and the statistics problem in one pass.

Indexing JSONB With GIN

When you do query inside JSONB, the index you want is almost always GIN. There are two operator classes, and the choice matters.

The default jsonb_ops indexes every key and value and supports key-existence operators (?, ?|, ?&) plus containment (@>). The jsonb_path_ops class indexes only the paths-to-values, which makes it smaller and faster for the containment queries that dominate real workloads — at the cost of not supporting the key-existence operators.

-- Smaller, faster, containment-only — the usual right choice
CREATE INDEX events_payload_idx
  ON events USING gin (payload jsonb_path_ops);

If you genuinely need to ask “does this row have key X at all,” use the default class instead:

CREATE INDEX events_payload_full_idx
  ON events USING gin (payload);

Pick jsonb_path_ops by default. Reach for the larger jsonb_ops only when you actually use the existence operators, and let the access pattern drive it rather than guessing. The Postgres index type selector prompt is useful when you are unsure whether GIN, btree, or an expression index fits a given JSONB query.

Query With Containment

The @> containment operator is the workhorse for JSONB, and it is the one a GIN index accelerates. It asks “does the left document contain the right one as a subset.”

-- Uses the jsonb_path_ops GIN index
SELECT id, occurred_at
FROM events
WHERE payload @> '{"source": "mobile", "ab_test": "checkout_v2"}';

This is far better than reaching in with ->> and comparing, because payload->>'source' = 'mobile' cannot use the GIN index — it forces a sequential scan unless you build a dedicated expression index. Always confirm with EXPLAIN that your containment query hits the index:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM events
WHERE payload @> '{"source": "mobile"}';
-- Look for a Bitmap Index Scan on events_payload_idx

If you see a sequential scan, either the query is not expressed as containment or the index is missing.

Generated Columns for Hot Keys

When one key inside JSONB is queried constantly but you do not want to fully promote it, a stored generated column gives you a typed, indexable extraction without changing your write path. The application keeps writing the JSONB; Postgres maintains the derived column.

ALTER TABLE events
  ADD COLUMN source text
  GENERATED ALWAYS AS (payload->>'source') STORED;

CREATE INDEX events_source_idx ON events (source);

Now WHERE source = 'mobile' uses a plain, cheap btree index and produces accurate statistics, while the data still lives logically inside the payload. This is the cleanest way to make one hot JSONB key behave like a first-class column. Use it deliberately — every generated column adds storage and write cost — but for a key in the hot path it is well worth it.

Avoiding Bloat

JSONB has a quiet cost: updates rewrite the entire value. Postgres uses MVCC, so updating one key in a large JSONB document writes a whole new row version, and the old version waits for vacuum. A wide, frequently-updated JSONB column generates dead tuples fast and bloats the table.

A few habits keep this in check. Do not select the whole JSONB when you only need two keys — pull the keys you need so you are not shipping kilobytes per row across the wire. Keep frequently-updated fields out of JSONB and in normalized columns where a narrow update touches less data. And watch the table’s dead-tuple ratio:

SELECT relname,
       n_dead_tup,
       n_live_tup,
       round(n_dead_tup::numeric / nullif(n_live_tup, 0), 3) AS dead_ratio
FROM pg_stat_user_tables
WHERE relname = 'events';

If dead_ratio climbs and stays high, your JSONB update pattern is generating bloat faster than autovacuum clears it, and that is a signal to promote the churning keys out of the document. AI is good at reviewing an update workload and pointing at which keys are causing the churn; describe how the column is written and ask.

JSONB rewards design and punishes laziness. The teams who love it treated it as a deliberate choice — variable data in JSONB, stable data in columns, GIN with jsonb_path_ops for containment, generated columns for the hot keys, and an eye on bloat. The teams who regret it dumped everything into one data column and wondered why the database got slow. Let AI review your proposed schema with your real access patterns in hand; it reliably catches the columns that should not be JSONB. For the full design review, the JSONB schema design review prompt walks the whole checklist, and the Postgres category collects the indexing and tuning guides that go with it.