Essential Postgres Extensions: pg_stat_statements, pg_repack, TimescaleDB With AI
A field guide to three Postgres extensions that earn their keep: query observability, lock-free bloat removal, and time-series, with AI reading your output.
- #postgres
- #ai
- #extensions
- #pg-stat-statements
- #timescaledb
Postgres ships with a small core and an enormous extension ecosystem, and the difference between a team that struggles with Postgres and one that runs it calmly is usually which extensions they have installed before the incident, not after. I have a short list of extensions I install on practically every production database, because they convert problems that are otherwise invisible or painful into routine operations. Three earn their place every time: pg_stat_statements for seeing what the database actually spends its time on, pg_repack for reclaiming bloat without taking a heavy lock, and TimescaleDB when the workload is genuinely time-series. AI is a strong partner here, especially for reading the firehose of numbers pg_stat_statements produces — but the install decisions and the verification stay with you, because an extension is code running inside your database.
Here is when I reach for each, how to set it up, and where AI fits.
pg_stat_statements: see where the time goes
If I could install exactly one extension on a database, it would be this one. pg_stat_statements aggregates execution statistics per normalized query — calls, total time, rows, cache hits — so instead of guessing which query is slow, you measure it. It is the foundation that the other two extensions’ decisions often rest on.
It requires a shared library load, which means a config change and a restart.
# In postgresql.conf
# shared_preload_libraries = 'pg_stat_statements'
# Then restart Postgres, and in the target database:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- The query I run first on any unfamiliar database:
SELECT
queryid,
calls,
round(total_exec_time::numeric, 1) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Sort by total_exec_time and the top of that list is where your database lives. A query that runs in two milliseconds but executes a million times can outweigh a five-second report that runs once an hour. That ranking is the single most useful artifact for prioritizing work.
The raw output is dense, and this is exactly where AI shines: not at running the query, but at reading two columns together and spotting the pattern. I paste the result and ask for an interpretation, then I check its reasoning against what I know about the schema.
Prompt: Here is my top
pg_stat_statementsoutput by total time. Query A: 1.2M calls, 480000 ms total, 0.4 ms mean. Query B: 90 calls, 410000 ms total, 4555 ms mean. Both are in my top three by total time. How should I prioritize, and what’s the likely fix for each?Output: These are two different problems. Query B is a classic slow query — 4.5 s mean over few calls — so the lever is the plan: look for a missing index or a bad join, and triage it with
EXPLAIN (ANALYZE, BUFFERS). Query A is a frequency problem — each call is cheap at 0.4 ms but 1.2M calls dominate total time. The fix is rarely the query itself; it’s reducing call volume (caching, batching, an N+1 pattern in the app) or shaving the per-call cost. Prioritize B first if user-facing latency is the complaint; prioritize A first if you’re CPU-bound. Confirm which by checking whether mean times are stable under load.
That framing — frequency problem versus latency problem — is the right mental split, and the AI surfaces it from two numbers. I still verify by pulling the actual plan, which the slow query EXPLAIN triage prompt is built for. Once you know which queries hurt, the index advisor prompt reads the same view to suggest indexes.
pg_repack: remove bloat without the lock
When a table or index has accumulated bloat — dead space that autovacuum reclaims for reuse but does not return to the OS — the textbook fix is VACUUM FULL. The problem is that VACUUM FULL takes an ACCESS EXCLUSIVE lock for its entire duration, which on a large hot table means an outage. pg_repack does the same physical rewrite online, holding only a brief exclusive lock at the start and end, while the bulk of the work happens with the table readable and writable.
It needs the extension plus a client-side binary.
CREATE EXTENSION IF NOT EXISTS pg_repack;
# Repack a single bloated table, online:
pg_repack -d mydb -t public.events --no-order
# Or repack just the indexes on a table:
pg_repack -d mydb -t public.events --only-indexes
A few realities to plan for: pg_repack needs roughly twice the table’s disk during the rebuild because it builds a copy, the target table must have a primary key or a non-null unique index, and it does real I/O, so I schedule it off-peak even though it does not block. I reach for it when bloat has already accumulated and I need to recover space without downtime. Preventing the bloat in the first place is an autovacuum-tuning job — a different lever, covered by the vacuum and autovacuum tuning prompt — and the two complement each other: tune autovacuum to keep bloat low, use pg_repack to recover from the times it got away from you.
TimescaleDB: when the workload is time-series
The third extension is conditional. If your dominant access pattern is append-heavy time-series data — metrics, events, sensor readings, anything with a timestamp and a steady insert stream you query by time range — TimescaleDB transforms how Postgres handles it. Its core abstraction is the hypertable: a regular-looking table that is automatically partitioned into time-based chunks under the hood. Queries that filter by time prune to the relevant chunks, inserts go to the newest chunk, and old chunks can be compressed or dropped wholesale.
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE TABLE sensor_data (
ts timestamptz NOT NULL,
sensor_id bigint,
temperature double precision
);
-- Turn it into a hypertable partitioned weekly:
SELECT create_hypertable('sensor_data', 'ts', chunk_time_interval => INTERVAL '7 days');
The wins are real: time-bucketed aggregation via time_bucket(), native compression on older chunks, and retention policies that drop aged data without a giant DELETE. But TimescaleDB is a substantial dependency with its own upgrade cadence and operational model, so I only install it when the workload truly is time-series. For ordinary relational data with occasional time filters, plain declarative partitioning on a range key is simpler and keeps you on stock Postgres.
The verification discipline for any extension
An extension runs code inside your database, so I treat installs like dependency upgrades: pin versions, read the changelog, test in staging, and confirm the privilege footprint. AI is excellent at explaining what an extension does, drafting the setup SQL, and interpreting the output you feed it — and it is genuinely fast at turning pg_stat_statements numbers into a prioritized plan. What it cannot do is decide that your workload is time-series, or confirm that pg_repack had the disk headroom it needed. You run the checks; the AI reads the results. Browse the Postgres category for the deeper dives on each of these, and install the observability extension before you need it — measuring after the incident is always too late.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.