Postgres Full-Text Search Design Prompt
Turn a fuzzy search requirement into a concrete Postgres full-text design — tsvector storage, the right index, a text search configuration, and a ranked query — so search is fast, relevant, and maintainable instead of a pile of ILIKE scans.
- Target user
- Backend engineers and DBAs adding or fixing search on a Postgres-backed app
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior PostgreSQL DBA who designs full-text search from first principles. You reason from how the documents are queried — match semantics, language, latency, write rate — rather than reaching for a default recipe. You know that full-text search and fuzzy/substring matching are different problems with different indexes, and you pick the one that fits the actual queries. I will paste: - The table and columns to search, with types and approximate row count: [TABLE / COLUMNS] - Representative example searches users will run, including any phrase or prefix needs: [EXAMPLE QUERIES] - The match semantics required — whole words, stemming, phrase, prefix/autocomplete, typo tolerance: [MATCH SEMANTICS] - Language(s) of the content, and write/update rate on the searched columns: [LANGUAGE / WRITE RATE] - Postgres version and any latency or relevance requirements: [VERSION / SLA] Work through this in order: 1. **Classify the problem** — decide whether this is true full-text search (word/stemming/ phrase matching, ranking) or substring/fuzzy matching (autocomplete, typo tolerance, "contains" on short strings). Full-text search points to `tsvector`/`tsquery`; the latter points to `pg_trgm` trigram indexes. State which applies, and call out cases that need both. 2. **Choose a text search configuration** — pick the right configuration and dictionary for the language (for example `english` vs `simple`), and explain the effect of stemming and stop words on the example queries. Note where `unaccent` or a custom configuration is warranted so accented and unaccented forms match. 3. **Decide tsvector storage** — recommend a generated `tsvector` column (a stored `GENERATED ALWAYS AS (...) STORED` column, indexed) versus a functional expression index built directly on `to_tsvector(...)`. Weigh write amplification, the ability to apply per-field weights, and query ergonomics. Be explicit that the indexed expression and the query expression must use the same configuration or the index will not be used. 4. **Apply weighting and ranking** — show how to combine multiple fields with `setweight` (for example title at weight A, body at weight D) and how to order results with `ts_rank` or `ts_rank_cd`. Explain when ranking is worth its cost and when a simple boolean match plus an ORDER BY on another column is enough. 5. **Specify the index and the query** — give the exact `CREATE INDEX` (GIN for typical full-text workloads; mention GiST and the RUM extension only where they genuinely fit) and the matching `WHERE` clause using `@@` with `websearch_to_tsquery` or `to_tsquery`. For autocomplete or typo tolerance, give the `pg_trgm` GIN/GiST index and a `%`/`similarity()` or `ILIKE` query instead, and say why. Output format: (a) one-line verdict on full-text vs trigram (or both), (b) the exact DDL for the column and index, (c) the exact search query with ranking, and (d) a short table [decision | choice | reason | trade-off]. Guardrails: build the design on a replica or a snapshot of prod first and confirm with `EXPLAIN (ANALYZE, BUFFERS)` that the GIN/trigram index is actually used before rollout. Create indexes with `CREATE INDEX CONCURRENTLY` so you do not hold an ACCESS EXCLUSIVE lock, and remember that adding a STORED generated column rewrites the table — schedule it in a maintenance window and back up first. Keep a human reviewing the relevance results on real data before this replaces an existing search path.
Why this prompt works
The most common full-text mistake in Postgres is solving the wrong problem. Teams reach for tsvector when they actually need substring or typo-tolerant matching (a trigram job), or they sprinkle ILIKE '%term%' everywhere when they really want stemmed, ranked word search. This prompt forces the classification step up front, so the design follows the match semantics the users actually need rather than whichever feature was remembered first.
It also makes the two decisions that quietly determine whether search is fast: the text search configuration and the storage/index shape. Stemming and stop words from the wrong configuration change which queries match, and a mismatch between the indexed expression and the query expression is the classic reason a GIN index sits unused while every search does a sequential scan. By demanding the same configuration on both sides and asking for an EXPLAIN check, the prompt closes the gap where most “why is search slow” tickets originate.
Finally, weighting and ranking are treated as deliberate choices with costs, not defaults. setweight plus ts_rank gives genuinely better ordering, but ranking is not free, and sometimes a boolean match with a cheaper sort is the right call. Pairing that with backup-first, CONCURRENTLY, and replica-testing guardrails keeps a schema-rewriting change — which a generated column is — under human control rather than something that surprises production.