Connecting Services Privately With AWS PrivateLink and VPC Endpoints
Interface vs gateway endpoints, endpoint policies, private DNS, and cross-account PrivateLink services — drafted with AI and verified against how the traffic actually flows.
- #aws
- #ai
- #privatelink
- #vpc
- #networking
Every team eventually hits the moment where a security review asks the same question: “Does this traffic to S3 leave our VPC?” If the answer is “it goes out the NAT gateway to a public endpoint,” you have both a data-egress story to tell and a NAT bill to explain. PrivateLink and VPC endpoints are how you make that answer “no, it stays on the AWS network.” But the feature has enough moving parts — two endpoint types, endpoint policies, private DNS overrides, and a whole cross-account service model — that it’s easy to wire up something that looks private and isn’t.
I lean on an AI assistant heavily when designing endpoint topologies, mostly because it’s good at remembering which services are gateway-only and at drafting endpoint policies I’d otherwise copy-paste wrong. But every draft gets checked against one question: where does the packet actually go? Let’s walk the topology.
Gateway vs interface endpoints
There are exactly two gateway endpoints in AWS: S3 and DynamoDB. Everything else is an interface endpoint. This is the single most important fact to internalize, because the two types behave nothing alike.
A gateway endpoint is a route-table entry. You don’t get an ENI or an IP — AWS injects a prefix list into the route tables you associate, and traffic to that service’s public CIDR is redirected onto the AWS backbone. It’s free, and it only works from inside the VPC whose route tables reference it.
An interface endpoint (this is PrivateLink proper) creates an actual elastic network interface with a private IP in each subnet you specify. It costs per-hour-per-AZ plus per-GB, and because it’s a real ENI, it’s reachable from peered VPCs, Direct Connect, and VPN — anywhere that can route to that subnet.
Here’s a gateway endpoint for S3:
aws ec2 create-vpc-endpoint \
--vpc-id vpc-0a1b2c3d4e5f60718 \
--vpc-endpoint-type Gateway \
--service-name com.amazonaws.us-east-1.s3 \
--route-table-ids rtb-0aa11bb22cc33dd44 rtb-0ee55ff66aa77bb88 \
--policy-document file://s3-endpoint-policy.json
And an interface endpoint for the ECR API, with private DNS enabled:
aws ec2 create-vpc-endpoint \
--vpc-id vpc-0a1b2c3d4e5f60718 \
--vpc-endpoint-type Interface \
--service-name com.amazonaws.us-east-1.ecr.api \
--subnet-ids subnet-0a1b2c3d subnet-0e5f6a7b \
--security-group-ids sg-0c0ffee0c0ffee0c0 \
--private-dns-enabled
Note the security group on the interface endpoint. The ENI lives in a subnet and is governed by a security group like any other ENI — if your workloads can’t reach the endpoint on 443, check that first. The gateway endpoint has no security group because there’s no ENI; you control its reach with route-table associations and the endpoint policy.
Endpoint policies are a second gate, not the only gate
A common misread is treating the endpoint policy as the access-control mechanism. It isn’t — it’s an additional filter that intersects with IAM and resource policies. The effective permission is the AND of all of them. An endpoint policy that allows everything ("*") doesn’t grant anything; it just declines to further restrict.
Where endpoint policies earn their keep is scoping the blast radius of the endpoint itself. This one says: traffic through this S3 endpoint may only touch our two buckets, and only from our org.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RestrictToOwnedBuckets",
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::acme-prod-artifacts",
"arn:aws:s3:::acme-prod-artifacts/*",
"arn:aws:s3:::acme-prod-logs",
"arn:aws:s3:::acme-prod-logs/*"
],
"Condition": {
"StringEquals": { "aws:PrincipalOrgID": "o-abc123def4" }
}
}
]
}
The aws:PrincipalOrgID condition is the part people forget. Without it, this endpoint will happily serve any IAM principal that can route to it. With it, even a misconfigured cross-account role can’t pull data through your endpoint into someone else’s account.
Private DNS: the part that silently fails
--private-dns-enabled is what makes s3.us-east-1.amazonaws.com or ecr.api resolve to the endpoint’s private IPs instead of the public ones. It only works if your VPC has both enableDnsSupport and enableDnsHostnames set to true. If DNS hostnames are off, the flag is accepted, the endpoint is created, and your traffic keeps going out the NAT gateway — no error, just a quietly broken private path.
Verify it from inside the VPC, not from your laptop:
# From an instance in the VPC — should return the endpoint's private IPs
dig +short ecr.api.us-east-1.amazonaws.com
# Confirm the VPC attributes are actually enabled
aws ec2 describe-vpc-attribute --vpc-id vpc-0a1b2c3d4e5f60718 \
--attribute enableDnsHostnames
This is exactly the kind of step I ask an assistant to enumerate, because it’s the difference between “configured” and “working.”
Prompt: I enabled private DNS on an interface endpoint for
ecr.dkrbutdocker pullfrom my ECS tasks still seems to use the public path. What should I check, in order?
Reply: Walk it from DNS outward. First, ECR needs two interface endpoints —
ecr.apifor the control plane andecr.dkrfor image layers — and pulls also hit S3, so you need an S3 gateway (or interface) endpoint too; a missing S3 path is the most common cause here. Second, confirm bothenableDnsSupportandenableDnsHostnamesare true on the VPC, or private DNS won’t override resolution. Third, check the endpoint’s security group allows 443 inbound from your task subnets. Fourth, rundigfordkr.ecr.<region>.amazonaws.comfrom a task ENI and confirm it resolves to a private (10.x/172.x) address. I can’t see your account, so treat the S3-endpoint check as the first thing to verify rather than an assumption.
That answer is correct and, crucially, it flags the S3 dependency for ECR pulls that the original question didn’t mention — but it also tells me to go verify rather than trust it. I confirmed the missing S3 gateway endpoint was the culprit. That’s the workflow: the model surfaces the checklist, I run the dig.
Cross-account services with PrivateLink
The most powerful use of PrivateLink is exposing your own service to consumers in other accounts or VPCs without VPC peering. You put a Network Load Balancer in front of your service, wrap it in an endpoint service, and consumers create interface endpoints that connect to it. Traffic flows one-directionally: consumer → your NLB. The consumer never sees your VPC CIDR, and you never see theirs, so overlapping IP ranges are a non-issue.
# Provider side: publish the service behind an NLB
aws ec2 create-vpc-endpoint-service-configuration \
--network-load-balancer-arns arn:aws:elasticloadbalancing:us-east-1:111122223333:loadbalancer/net/acme-svc/abc123 \
--acceptance-required \
--supported-ip-address-types ipv4
# Provider side: allow a specific consumer account to find it
aws ec2 modify-vpc-endpoint-service-permissions \
--service-id vpce-svc-09a1b2c3d4e5f6071 \
--add-allowed-principals arn:aws:iam::444455556666:root
--acceptance-required means each consumer connection lands in a pendingAcceptance state until you approve it — worth keeping on for anything sensitive, since it gives you a manual gate and an audit trail of who connected.
On the consumer side, they create an interface endpoint pointing at your service name (com.amazonaws.vpce.us-east-1.vpce-svc-...) exactly like any AWS service endpoint. Because the service name is region-scoped and unguessable, plus the allowed-principals list, you have two independent controls before any packet flows.
One gotcha the AI got right and I’d have missed: when --acceptance-required is on, consumers’ DNS won’t resolve to your endpoint until after you accept. So if a consumer reports “endpoint created but connection refused,” check the connection state on the provider side first.
Wiring it into a repeatable pattern
Once you’ve done this twice, codify it. The decision tree is small enough to make routine: S3 or DynamoDB gets a free gateway endpoint on every route table; everything else is an interface endpoint with a security group scoped to your workload subnets, private DNS on, and an endpoint policy with an org-ID condition. Cross-account exposure goes through an NLB-backed endpoint service with acceptance required.
I keep that decision tree, plus the verification commands, as a reusable prompt so the next engineer doesn’t rediscover the ECR-needs-S3 trap. If you want starting points, browse the AWS guides and the networking entries in the prompt library — the endpoint-policy generator pairs well with the cross-account pattern above. For the broader cost and reliability framing, the AI-assisted AWS Well-Architected review covers where private connectivity fits in the security and cost-optimization pillars.
The throughline is the same as everywhere else: let the assistant draft the policy and the checklist, then trace the packet yourself. PrivateLink fails quietly, and “quietly” is the worst failure mode for a security control.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.