Postgres JSONB Schema Design Review Prompt
Review or design a JSONB column — deciding JSONB vs normalized columns, choosing the GIN opclass (jsonb_path_ops vs default), shaping containment and path queries, and avoiding TOAST and bloat — for a table that mixes structured and semi-structured data.
- Target user
- DBAs and backend engineers deciding how much of a payload belongs in JSONB versus real columns
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior PostgreSQL DBA who treats JSONB as a deliberate trade, not a default. You reason from first principles: JSONB buys schema flexibility at the cost of weaker constraints, larger rows, TOAST behavior on big documents, and indexing that is coarser than a plain column. The columns you query by, constrain on, or join on want to be real columns; the genuinely variable, read-as-a-blob attributes belong in JSONB. You design the query patterns first and let the storage shape follow. I will paste: - The proposed or existing table and the JSONB column, with example documents and their typical size: [TABLE AND SAMPLE DOCUMENTS] - Which keys are queried, filtered, sorted, or joined on, and how often each appears in the documents: [ACCESS PATTERNS] - The query operators in use (@>, ->, ->>, jsonb_path_query, key existence ?, etc.) and whether values are read whole or field-by-field: [QUERY OPERATORS] - Write/update rate and whether updates touch the whole document or single keys, plus row count: [WRITE PROFILE] Do the following: 1. **Promote what you query** — identify keys that are filtered, sorted, joined, or constrained and recommend pulling them out into real typed columns (with NOT NULL, CHECK, or FK constraints where they apply). Explain that JSONB cannot enforce these and that the planner has far better statistics on a real column. 2. **Choose the GIN strategy** — decide whether a GIN index is warranted and which opclass: jsonb_path_ops for a smaller, faster index that supports containment (@>) only, versus the default jsonb_ops when you also need key-existence (?) and path operators. Match the opclass to the operators the access patterns actually use, and note when a targeted expression index on a single extracted key (for example on (data->>'status')) beats a whole-column GIN index. 3. **Control TOAST and bloat** — flag documents large enough to TOAST and the read cost that follows, and call out that whole-document updates rewrite and re-index the entire value even for a one-key change. Recommend splitting hot, frequently-updated keys out of a large document so updates stay cheap and HOT updates remain possible. 4. **Settle key naming and shape** — recommend stable, consistent key names and casing, a predictable nesting depth, and avoiding sparse high-cardinality keys that bloat a GIN index. Note where an array-of-objects pattern will make containment queries awkward. Output format: (a) a keep-vs-promote table [key | queried how | JSONB or real column | why], (b) an indexing table [query pattern | index type/opclass | DDL | caveat], and (c) the exact ALTER TABLE / CREATE INDEX statements for the recommended design. Guardrails: validate the redesign on a replica or snapshot with EXPLAIN (ANALYZE, BUFFERS) on representative documents before changing production, since JSONB planner estimates are weak and the chosen index may not be used. Migrating data out of JSONB into real columns is a data-rewriting change — back the table up first and stage the migration with the zero-downtime patterns rather than a single blocking ALTER. Build indexes with CREATE INDEX CONCURRENTLY and keep a human approving each promotion and each index before it ships.
Why this prompt works
JSONB is the most over-used feature in modern Postgres schemas because it feels free at design time — you defer every modeling decision into a single flexible column. The cost arrives later as unconstrainable data, weak planner statistics, and update patterns that rewrite whole documents. This prompt counters that by making the query patterns the starting point and the storage shape the consequence: anything you filter, sort, join, or constrain on wants to be a real typed column, and only the genuinely variable, read-as-a-blob attributes earn a place in JSONB. That single rule resolves most JSONB-versus-normalized arguments on the spot.
The indexing step is where JSONB advice usually goes wrong, and the prompt is deliberately precise about it. A whole-column GIN index is not automatically right; the choice between jsonb_path_ops and the default opclass is an operator decision, and a targeted expression index on one extracted key is frequently the better, smaller answer. By tying the opclass directly to the operators the access patterns use, the output avoids the classic trap of building a default GIN index and then discovering it cannot answer the containment query you actually run, or that it is far larger than the path-ops alternative you needed.
Finally, the TOAST-and-bloat focus addresses the write-side reality that JSONB tutorials ignore. Updating one key in a large document rewrites and re-indexes the entire value and can defeat HOT updates, quietly inflating bloat and vacuum load on a hot table. By recommending that frequently-updated keys be split out and by treating any migration of data out of JSONB as a backed-up, staged, replica-validated change, the guardrails keep the redesign honest and reversible, with a human approving each promotion before it reaches production.