Postgres Index Type Selector Prompt

Pick the right index access method — B-tree, BRIN, GIN, GiST, Hash, or SP-GiST — for a specific column and query pattern, reasoning from cardinality, data distribution, and the operators your queries actually use.

Target user

DBAs and backend engineers who reach for B-tree by reflex and want the access method that actually fits the column

Difficulty

Intermediate

Tools

Claude, ChatGPT, Cursor

You are a senior PostgreSQL DBA who knows that B-tree is the right answer most of the time and the wrong answer often enough to matter. You reason from first principles: an index type is a deal between the operators it can answer, the physical layout of the data, and the storage and maintenance cost it imposes. BRIN is nearly free but only pays off when the column correlates with physical row order; GIN is built for "is this contained in that" over composite values like JSONB, arrays, and tsvector; GiST and SP-GiST answer geometric and range overlap; Hash answers equality and nothing else. You match the method to the query, not to habit. I will paste: - The table and column(s) in question, with data type and rough row count: [TABLE/COLUMN] - The cardinality and distribution of the column (distinct values, skew, whether it correlates with insertion/physical order like a timestamp or serial): [DISTRIBUTION] - The exact predicates and operators these queries use (=, <, BETWEEN, @>, ?, &&, ILIKE, ORDER BY, etc.): [QUERY PATTERNS] - Write rate and how much index size/maintenance you can afford: [WRITE/STORAGE BUDGET] Do the following: 1. **Read the operators** — map each query's operators to the access methods that can actually serve them. Equality-and-range and ORDER BY point at B-tree; containment (@>), existence (?), and full-text point at GIN; overlap (&&) and nearest-neighbor point at GiST; pure equality on a huge non-ordered column can point at Hash. 2. **Weigh the data shape** — for naturally-ordered, append-mostly columns (event time, bigserial) on large tables, evaluate BRIN: it is tiny and cheap but only effective when physical/logical correlation is high. State the correlation assumption explicitly and how to check it (pg_stats correlation). 3. **Cost the trade-off** — for each viable method, give the win (which queries get faster) against the cost (index size, build time, write amplification, and any caveats such as GIN's slower single-row updates or the fastupdate pending-list behavior). 4. **Recommend and justify** — pick one primary recommendation per query pattern, note any opclass that matters (for example a path-ops opclass for JSONB containment, or a trigram opclass for ILIKE), and call out when the honest answer is "B-tree is fine." Output format: a decision table [column/query | operators | recommended index type + opclass | why this beats B-tree here | cost caveat], followed by the exact CREATE INDEX statements. Guardrails: build candidate indexes on a replica or snapshot first and confirm the planner actually chooses them with EXPLAIN (ANALYZE, BUFFERS) on representative data — an index the planner ignores is pure write cost. For BRIN, verify the correlation assumption holds before trusting it, since a reordered or randomly-inserted column makes BRIN nearly useless. Build with CREATE INDEX CONCURRENTLY in production and keep a human reviewing each choice; do not swap a working B-tree for an exotic type without measured proof it wins.

Why this prompt works

Most index problems are not missing indexes — they are the wrong kind of index. Engineers default to B-tree because it is the implicit choice behind PRIMARY KEY and UNIQUE, and then wonder why a JSONB containment query or a timestamp-range scan over a billion rows stays slow. This prompt reframes index selection as a deal between three forces: the operators the query uses, the physical shape of the data, and the storage-and-maintenance budget. Starting from the operators is the key move, because the access method is fundamentally defined by which operator classes it can answer, and that immediately rules most candidates in or out.

The distribution and cost steps are where the genuinely useful decisions live. BRIN is the headline example: it is almost free and can replace a huge B-tree on an append-mostly time-series table, but it collapses to a near-full scan the moment physical correlation breaks down. By forcing the correlation assumption to be stated and checked against pg_stats, the prompt prevents the common failure of recommending BRIN on a column that has been shuffled by updates or out-of-order inserts. The same discipline applies to GIN, where the win on containment and full-text comes with a real write-side cost that has to be measured, not assumed.

Crucially, the prompt is honest about when nothing beats B-tree. Exotic index types are over-prescribed by people who just learned they exist, and a GiST or Hash index that the planner never chooses is pure write amplification. By ending every recommendation with an EXPLAIN-on-a-replica guardrail and an explicit “B-tree is fine” escape hatch, the output stays grounded in measured behavior rather than novelty, and a human still approves each swap before it touches production.

Postgres Index Type Selector Prompt

Why this prompt works

Related prompts

Postgres Index Advisor from pg_stat_statements Prompt

Why this prompt works

Related prompts

Postgres Index Advisor from pg_stat_statements Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet