Querying External Data With Postgres Foreign Data Wrappers

A product manager asked me why a “simple” report needed three ETL jobs, an S3 bucket, and a nightly sync just to join orders in one Postgres database against customer records in another. The honest answer was that nobody had reached for foreign data wrappers. FDWs let a Postgres server read tables that live somewhere else — another Postgres instance, a CSV file on disk, sometimes an entirely different system — as if they were local tables. You write a normal SELECT, and Postgres fetches the rows over the wire. Done well, this removes whole categories of glue code. Done badly, it pulls a billion rows across the network to filter five of them locally. AI is genuinely useful for the boilerplate DDL and for reasoning about which predicates can be pushed down, but the wire behavior is something you confirm with EXPLAIN, not something you take on faith.

This is how I stand up an FDW link, let AI draft the scaffolding, and verify that it performs the way I think it does.

What an FDW actually is

The SQL/MED standard splits an external connection into three objects, and understanding the split is most of the battle. A foreign server describes where the data lives (host, port, database name). A user mapping says which local Postgres role authenticates as which remote credential. A foreign table is a local schema definition that points at a specific remote table or object. Add the extension that provides the wrapper itself, and you have the full stack: extension, server, user mapping, foreign table.

The two wrappers that ship in contrib cover most needs. postgres_fdw connects to another Postgres server and is the one worth mastering, because it supports predicate and join pushdown. file_fdw reads flat files (CSV, text) as tables, which is handy for log files and exports. There are third-party wrappers for MySQL, SQLite, object stores, and more, but the mechanics are identical, so the patterns here transfer.

Wiring up postgres_fdw

Here is the full sequence for reading a remote Postgres table. I keep credentials out of the SQL where I can, but the user mapping needs them somewhere.

CREATE EXTENSION IF NOT EXISTS postgres_fdw;

CREATE SERVER orders_remote
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host 'db-orders.internal', port '5432', dbname 'orders');

CREATE USER MAPPING FOR app_reporting
  SERVER orders_remote
  OPTIONS (user 'fdw_reader', password 'CHANGE_ME');

CREATE FOREIGN TABLE remote_orders (
  order_id    bigint,
  customer_id bigint,
  total_cents integer,
  placed_at   timestamptz
)
  SERVER orders_remote
  OPTIONS (schema_name 'public', table_name 'orders');

A cleaner option for whole schemas is IMPORT FOREIGN SCHEMA, which introspects the remote side and creates foreign tables for you. It saves typing and avoids column-type drift, but it imports everything unless you constrain it.

CREATE SCHEMA orders_ext;

IMPORT FOREIGN SCHEMA public
  LIMIT TO (orders, order_items)
  FROM SERVER orders_remote
  INTO orders_ext;

file_fdw for flat files

When the “external system” is a CSV someone drops on a share, file_fdw reads it directly. No load step, no staging table.

CREATE EXTENSION IF NOT EXISTS file_fdw;

CREATE SERVER files
  FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE region_lookup (
  region_code text,
  region_name text
)
  SERVER files
  OPTIONS (filename '/var/lib/postgres/imports/regions.csv', format 'csv', header 'true');

The file has to be readable by the Postgres OS user and lives on the database server’s filesystem, which trips people up the first time. It is a read-only window onto the file, re-read on every query, so it is best for small reference data rather than anything hot.

Where AI helps, and where it does not

The DDL above is exactly the kind of structured-but-fiddly text where AI saves real time. I describe the remote table and the wrapper, and let it produce a first draft of the server, user mapping, and foreign table — then I check the column types against the remote catalog myself, because a numeric(10,2) quietly typed as integer will silently truncate.

Prompt: I have a remote Postgres table public.invoices with columns id bigint, account_id bigint, amount numeric(12,2), issued_on date, status text. Write postgres_fdw DDL to expose it locally as ext.invoices, assuming the server object billing_remote already exists. Flag any type-mapping risks.

Output: Here is the foreign table definition:
CREATE FOREIGN TABLE ext.invoices (
  id         bigint,
  account_id bigint,
  amount     numeric(12,2),
  issued_on  date,
  status     text
)
  SERVER billing_remote
  OPTIONS (schema_name 'public', table_name 'invoices');
Risk to verify: foreign table column types must match the remote types closely. numeric(12,2) is kept exactly here; if you had declared amount as double precision locally you would risk rounding. Confirm the remote amount precision in information_schema.columns before trusting aggregates.

That last line is the point. The draft is correct, but it tells me what to confirm rather than asserting it is safe. I run the check.

The performance question: pushdown

This is where FDWs earn or lose their keep. postgres_fdw can push a WHERE clause, and in recent versions an aggregate or a join between two foreign tables on the same server, to the remote side. When pushdown works, the remote server filters and you receive a few rows. When it does not, you pull the whole table and filter locally. The difference is enormous, and it is invisible until you look.

EXPLAIN (VERBOSE, COSTS OFF)
SELECT customer_id, sum(total_cents)
FROM remote_orders
WHERE placed_at >= now() - interval '7 days'
GROUP BY customer_id;

Read the Remote SQL line in the output. If the WHERE and the GROUP BY appear in the remote query, the work is happening on the far side. If you instead see a Foreign Scan with no remote filter and a local Aggregate, you are dragging the table across the network. Common pushdown killers are non-immutable functions, locally-defined operators, and stale remote statistics — so run ANALYZE on the foreign table, which samples the remote side. Setting use_remote_estimate 'true' on the server lets the planner ask the remote for cost estimates, which usually improves join decisions at the cost of a planning round trip.

For a deeper read on interpreting plans in general, the Postgres slow query EXPLAIN triage prompt gives you a structured way to hand a plan to an AI and get back the suspicious nodes ranked.

Operational gotchas worth remembering

A foreign table is a live dependency on another system. If the remote server is down, your query errors. If the remote schema changes, your foreign table definition drifts and you get type errors or wrong results. I treat FDW links as part of the dependency graph: monitored, version-controlled DDL, and a clear owner on the remote side. Connection management matters too — postgres_fdw keeps connections open per remote server per session, so a connection-pooler in front of the foreign server is worth considering for high-concurrency workloads.

Security is the other piece. The user mapping password sits in the catalog and is visible to superusers, so the remote credential should be a least-privilege read-only role scoped to exactly the tables you expose. Never map to a remote superuser for convenience.

A verification checklist before you ship

Whatever the wrapper, I run the same sequence: confirm the extension is loaded, create the server and a least-privilege user mapping, define or import the foreign tables, run ANALYZE, and then EXPLAIN (VERBOSE) the real query to confirm pushdown. AI accelerates the first four and helps me reason about the fifth, but the EXPLAIN output is the source of truth. If you are choosing between an FDW link and a replicated copy, see the broader Postgres category guides, and if a remote join turns out to be index-bound rather than network-bound, the index advisor prompt is the next stop on the remote server.

Foreign data wrappers turn “build a pipeline” into “write a query” for a real class of problems. Let AI handle the DDL boilerplate, keep the credentials least-privilege, and always read the remote SQL in the plan before you call it fast.

Querying External Data With Postgres Foreign Data Wrappers and AI