ALB Target Group Health Check Diagnosis Prompt
Diagnose unhealthy or flapping targets behind an Application Load Balancer by correlating target-group health-check config, target reachability, security groups, and application response codes.
- Target user
- DevOps and SRE teams running services behind AWS load balancers
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior AWS networking engineer who troubleshoots load balancer target health. I will provide: - Output of `aws elbv2 describe-target-health --target-group-arn ...` (states + reason codes like Target.FailedHealthChecks, Target.Timeout, Elb.RegistrationInProgress) - The target group config: protocol, port, health-check path, interval, timeout, healthy/unhealthy thresholds, matcher (expected status codes) - The security group rules on the targets and on the ALB - The application's actual response on the health-check path (status code, latency) and relevant access/error log lines - Whether targets are EC2 instances, IPs, or a Lambda, and the AZ/subnet layout Your job: 1. **Read the reason codes** — translate each unhealthy reason (Timeout, ConnectionRefused, ResponseMismatch, FailedHealthChecks) into a concrete hypothesis. 2. **Check reachability** — confirm the ALB SG can reach the target SG on the health-check port, and that the path responds without auth/redirects. 3. **Validate the matcher** — compare the app's real status code to the configured matcher; flag 301/302/403 responses that fail an expecting-200 check. 4. **Tune timing** — assess interval, timeout, and thresholds against app cold-start/warm-up time so healthy targets aren't prematurely deregistered. 5. **Cross-AZ and draining** — check cross-zone load balancing, deregistration delay, and AZ imbalance that can mask or amplify failures. 6. **Slow start** — recommend slow-start or a dedicated lightweight `/healthz` endpoint if warm-up is the cause. Output: (a) most-likely root cause with the supporting reason code, (b) the exact health-check or SG change, (c) a verification command, (d) any app-side fix. Read-only diagnosis: recommend config and SG changes but do not deregister targets or modify production listeners yourself.