Tuning postgresql.conf for Your Workload With AI

Someone on my team once found a “Postgres tuning guide” from 2013, copied its postgresql.conf wholesale onto a 64GB server, and set work_mem to a value that — multiplied across our connection count — could have requested several times the machine’s RAM under load. The settings were “optimized.” They were optimized for a different machine, a different Postgres version, and a different workload. This is the core problem with config tuning: the right value for almost every setting depends on your hardware, your workload shape, and your concurrency, none of which a generic guide knows. AI is a much better tuning partner than a static guide because it reasons about your numbers — but it still can’t measure your workload. You provide the facts; it reasons; you verify.

This is how I tune postgresql.conf without cargo-culting magic numbers.

Start from facts, not a template

Before changing anything, gather the inputs that actually drive the math: total RAM, core count, whether the box is dedicated to Postgres, read/write ratio, and concurrency. Then feed those to AI as the basis for reasoning, not a blank “tune my database” request.

This is a dedicated Postgres 16 server: 32GB RAM, 8 cores, NVMe storage, OLTP workload, ~150 concurrent connections through PgBouncer, read-heavy (roughly 90/10 read/write). Recommend starting values for shared_buffers, effective_cache_size, work_mem, maintenance_work_mem, and random_page_cost, and explain the reasoning for each. Flag any setting where my concurrency makes the obvious value dangerous.

The “concurrency makes the obvious value dangerous” clause is the important one — it forces the model to catch the work_mem multiplication trap instead of just handing back a number.

The memory settings that matter most

shared_buffers is Postgres’s own cache. The common starting point is around 25% of RAM on a dedicated box — more isn’t automatically better because the OS page cache also caches your data and double-buffering wastes memory. effective_cache_size doesn’t allocate anything; it tells the planner how much memory it can assume is available for caching, which influences whether it chooses index scans.

# 32GB dedicated box, starting points
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB

work_mem is the dangerous one. It’s allocated per sort/hash operation, per query — so a single complex query can use several multiples of it, and across many connections it multiplies fast. Set it conservatively and raise it only for specific reporting queries via SET LOCAL.

work_mem = 32MB    # per operation — be conservative with high concurrency

-- bump it just for one heavy analytical query, not globally
BEGIN;
SET LOCAL work_mem = '256MB';
SELECT ... ;  -- big GROUP BY / sort
COMMIT;

Tell the planner the truth about your storage

random_page_cost defaults to 4.0, a ratio that assumed spinning disks where random reads were far slower than sequential. On SSD/NVMe, random reads are nearly as cheap as sequential, and leaving it at 4.0 biases the planner against index scans it should be using.

random_page_cost = 1.1     # NVMe/SSD — random reads are cheap
effective_io_concurrency = 200

This one setting fixes more “why won’t it use my index” mysteries than anything else on flash storage. I ask AI to confirm the value for my disk type, then verify by checking that previously-sequential plans flip to index scans.

WAL and checkpoint settings for write workloads

If you write meaningfully, default checkpoint behavior causes I/O spikes as Postgres flushes dirty pages in bursts. Spreading checkpoints out smooths the I/O.

wal_compression = on
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9   # spread the flush over the interval
max_wal_size = 8GB

My workload has write bursts that cause checkpoint I/O spikes and latency stalls. Explain how checkpoint_completion_target, checkpoint_timeout, and max_wal_size interact, and recommend values to smooth the I/O without keeping WAL around so long that crash recovery becomes slow. Note the recovery-time tradeoff explicitly.

The recovery tradeoff matters: larger max_wal_size means fewer checkpoints but longer crash recovery. AI surfaces that tension; you decide where you want to sit on it.

Change one thing, measure, repeat

The discipline is identical to query tuning. Most of these need a reload, some need a restart:

ALTER SYSTEM SET random_page_cost = 1.1;
SELECT pg_reload_conf();           -- for reloadable settings

-- check what actually took effect and whether a restart is pending
SELECT name, setting, pending_restart
FROM pg_settings
WHERE name IN ('shared_buffers', 'work_mem', 'random_page_cost');

After each change, run your representative load and compare against the baseline you captured before touching anything. If a setting doesn’t move your measured latency or throughput, revert it — an “optimization” that does nothing is just config drift you’ll have to explain later.

The honest summary

Config tuning is where AI genuinely outperforms a static guide, because it reasons about your specific RAM, cores, storage, and concurrency instead of handing you someone else’s numbers. But “reasons about” is not “measures.” Every value it suggests is a hypothesis you validate against your own before/after numbers, one change at a time. The biggest wins — random_page_cost on flash, conservative work_mem, smoothed checkpoints — are also the most workload-dependent, which is exactly why you measure. More tuning material is in the Postgres guides, and I keep my config-review prompts in the prompt library so each box gets reasoned about from its real facts rather than a template.