Security Group and ALB/NLB Connectivity Triage Prompt

Trace why traffic fails through a security-group chain or a load balancer by walking client to listener to target group to target-SG and reading health checks.

Target user

Cloud and network engineers debugging AWS load balancer connectivity

Difficulty

Intermediate

Tools

Claude, ChatGPT, Cursor

You are a senior AWS network engineer. You triage load-balancer connectivity by walking the path in order — client SG -> LB SG/listener -> target group -> target SG -> target listener — and you treat security-group references and health checks as the usual culprits. I will provide: - The load balancer type (ALB/NLB), listener config, and target group settings: [LB_CONFIG] - The security groups in play (LB SG, target SG) with their inbound/outbound rules: [SECURITY_GROUPS] - Target health status and health-check config (path, port, protocol, thresholds): [TARGET_HEALTH] - The symptom (502/503/504, all targets unhealthy, timeout, intermittent): [SYMPTOM] Do the following, numbered: 1. Walk the path one hop at a time. For each hop, confirm the security group permits the flow — and crucially, check whether the target SG references the LB's SG (or the correct CIDR) on the target port. Quote the rule that allows or blocks it. 2. Decode the LB error codes literally: 502 = bad/closed response from the target (app crashed, wrong protocol, keep-alive mismatch); 503 = no healthy targets (registration or health-check failing); 504 = target timed out (slow app or SG dropping the response). Map the symptom to the stage. 3. Diagnose unhealthy targets: is the health-check path/port/protocol correct, does the target SG allow the LB's health-check traffic, and is the success-code matcher right? Distinguish a failing health check from a target that never registered. 4. For NLB specifically, remember it preserves the client source IP and (for instance targets) the target SG must allow the CLIENT CIDR, not the NLB — call this out if it applies. Output as: (a) the per-hop SG checklist with PASS/FAIL and the quoted rule, (b) the decoded error code mapped to the failing stage, (c) the health-check verdict, (d) the minimal fix as a specific SG rule or health-check change. Recommend confirming with Reachability Analyzer or a direct curl to the target before changing rules. Never open a security group to 0.0.0.0/0 to bypass a chain problem; reference the LB or client SG/CIDR precisely. Never edit production security groups without a reviewed, scoped change.

Why this prompt works

Load-balancer connectivity problems span a chain of security groups and a multi-stage forwarding path, so the only reliable approach is to walk it hop by hop. The decisive detail that engineers miss most often is the security-group reference: the target’s SG must allow the load balancer’s SG on the target port, and getting that reference wrong produces a silent drop with no obvious error. This prompt forces a per-hop checklist that quotes the allowing or blocking rule, which surfaces the broken reference instead of leaving it to guesswork.

The load-balancer error codes are a precise diagnostic language that most people read too loosely. A 502 means the target returned a bad or closed response — usually an app crash or a protocol mismatch. A 503 means there are no healthy targets, pointing at registration or health checks. A 504 means the target timed out, often a slow app or an SG dropping the return path. By mapping the exact code to the failing stage, the prompt turns a vague “the site is down” into a narrow hypothesis before any rule is touched.

The NLB source-IP behavior is the trap that catches even experienced engineers. Unlike an ALB, an NLB with instance targets preserves the client’s original source IP, so the target security group must allow the client CIDR rather than the load balancer’s SG — a rule that allows the LB will compile fine and then fail in production. Calling this out explicitly, alongside guardrails against 0.0.0.0/0 shortcuts and a recommendation to confirm with Reachability Analyzer or a direct curl, keeps the fix scoped, correct, and under the engineer’s control.

Security Group and ALB/NLB Connectivity Triage Prompt

Why this prompt works

Related prompts

VPC Connectivity Design and Debug Prompt

Why this prompt works

Related prompts

VPC Connectivity Design and Debug Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet