VPC Connectivity Design and Debug Prompt
Design subnets, route tables, and NACLs for a sound VPC topology, then methodically trace why two resources cannot reach each other.
- Target user
- Cloud and network engineers building or troubleshooting AWS VPC connectivity
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior AWS network engineer. You debug connectivity by tracing the packet path in order — eni -> security group -> NACL -> route table -> target — and you never assume a layer is fine without evidence. I will provide: - The VPC and subnet CIDRs and which subnets are public vs private: [VPC_LAYOUT] - Route table associations and routes: [ROUTE_TABLES] - NACL rules (inbound and outbound) for the relevant subnets: [NACL_RULES] - The source and destination (ENIs, IPs, ports, protocol): [SOURCE_DEST] - The symptom (timeout, refused, one-way, intermittent) and any Reachability Analyzer / VPC Flow Logs output: [SYMPTOM_AND_LOGS] Do the following, numbered: 1. Restate the intended path: source subnet -> route table -> (IGW / NAT / TGW / VPC endpoint / peering) -> destination, and confirm whether the destination is in-VPC, cross-VPC, or internet-bound. 2. Walk the path one hop at a time. For each hop check: does a route exist for the destination CIDR? Is the next-hop target correct and attached? Do the NACL rules (remember NACLs are stateless — check BOTH directions including ephemeral ports 1024-65535) permit the flow? Quote the specific rule or missing route as evidence. 3. Distinguish symptoms: a timeout points to routing / NACL / security group drops; a connection refused points to the target (no listener, wrong port); one-way traffic points to a stateless NACL missing the return path or asymmetric routing. 4. If a design is being created rather than debugged, propose the subnet split (public/private/data tiers across AZs), route tables, and NACL baseline, explaining each routing decision. Output as: (a) the intended-path diagram in text, (b) a per-hop checklist with PASS/FAIL and the evidence line, (c) the single most likely root cause, (d) the minimal fix as specific route/NACL/SG changes. Recommend confirming with Reachability Analyzer before changing anything. Never widen a NACL or route to 0.0.0.0/0 to "make it work" without justifying it; never modify production route tables without a tested plan and review.
Why this prompt works
VPC connectivity problems feel mysterious because AWS enforces reachability across at least five independent layers — security groups, NACLs, route tables, the next-hop gateway, and the target’s own listener — and any one of them can silently drop a packet. Engineers tend to fixate on whichever layer they touched last. This prompt imposes a fixed traversal order, so the model checks every hop in sequence and produces a per-hop PASS/FAIL with the exact rule or missing route as evidence, rather than jumping to a guess.
The stateless-NACL trap is the single most common reason a connectivity fix “works” in a quick test and then fails in production. Security groups are stateful, so engineers internalize that return traffic is automatic — but NACLs are not, and the return path uses ephemeral source ports that a one-directional rule never covers. By explicitly forcing a check of both directions and the 1024-65535 ephemeral range, the prompt catches asymmetric-NACL bugs that otherwise burn hours.
The symptom-classification step turns vague reports into direction. A timeout, a refused connection, and one-way traffic each implicate a different layer, so naming the symptom narrows the search before the path walk even begins. Combined with the recommendation to confirm via Reachability Analyzer before changing anything, the prompt keeps the engineer making the routing decisions while the model supplies disciplined, evidence-backed analysis.