Debugging GCP Load Balancers With AI: Backends and Health

The page said the service was down. Every backend in the managed instance group was showing unhealthy in the load balancer, the on-call had already restarted the whole group twice, and nothing changed. The backends were fine. The health check was hitting / and the app returned a 302 redirect to /login there, which the health check counted as a failure because it expected a 200. The instances were healthy; the load balancer just couldn’t tell. This is the most common GCP load balancer incident there is, and it’s emblematic of all of them: the failure is somewhere in the request path, and if you don’t know how to read the path you end up restarting healthy things. AI is good at reading the path because the path is well-defined.

Localize the failure before you touch anything

A GCP load balancer request flows through a forwarding rule, target proxy, URL map, backend service, and finally a backend gated by a health check. The HTTP status the client sees tells you roughly which layer broke, if you know how to read it. So the first prompt is always about localization, not fixes.

Prompt: “We’re seeing 502s from a global external Application Load Balancer. Explain what a 502 means in the GCP LB request path versus a 503, so I know whether to look at the backend’s response/timeout (502) or at health and capacity (503). Then tell me what to collect to confirm which layer is failing before I change anything.”

A 502 generally means a backend was reached but returned something bad, closed the connection, or exceeded the backend service timeout. A 503 generally means there’s no healthy backend or capacity is exceeded. Getting that distinction right at the start points the entire investigation in the correct direction and stops the reflexive backend restart.

Health-check truth

Most “all backends unhealthy” incidents are a health check that doesn’t match what the backend actually serves — wrong port, wrong path, or a path that returns a redirect or auth challenge instead of a 200. The other half are a firewall that doesn’t allow GCP’s health-check probe ranges to reach the backends. I check both together.

gcloud compute backend-services get-health my-backend --global
gcloud compute health-checks describe my-backend-hc --format=yaml

Prompt: “All backends show UNHEALTHY in get-health. Here’s the health-check config (protocol, port, request path, expected response) and what the app actually serves on that path. GCP health-check probes come from 35.191.0.0/16 and 130.211.0.0/22 and the firewall must allow them to the backend port. Check whether the health check matches what the app returns and whether a firewall rule could be blocking the probes. Give me the specific fix.”

In my redirect story, the model would have caught it in one pass: the health check expects 200, the path returns 302, so every backend reads as unhealthy. The fix is pointing the health check at a real /healthz endpoint that returns 200 — not restarting anything. If the firewall is the culprit, the rule allows only the documented probe CIDRs to the backend port; I never open them wider than that.

Uneven balancing and slow-endpoint 502s

When traffic piles onto one backend or capacity feels exhausted, the cause is usually the balancing mode and capacity settings, or session affinity, or simply that only one backend is healthy.

Prompt: “Traffic is concentrated on one backend instead of spreading evenly. Here’s the backend service config — balancing mode (RATE/UTILIZATION/CONNECTION), max capacity, and session affinity. Explain the likely cause of the uneven distribution and what to change, and warn me if the change could overload the remaining backends.”

That warning matters. Bumping max capacity or switching balancing mode can shift load onto fewer backends and cascade into an overload, so I confirm there’s headroom before applying. Slow-endpoint 502s are a different beast: if the backend’s real response time exceeds the backend service timeout, the LB cuts the connection and returns a 502 that looks like a backend crash but is really a timeout mismatch.

Prompt: “Some requests to a slow endpoint return 502 under load. The backend service timeout is 30s and this endpoint sometimes takes 45s. Confirm whether the timeout is the cause, and tell me whether to raise the timeout, fix the endpoint’s latency, or both — including the trade-off of a longer timeout holding connections open.”

NEG and MIG specifics

For serverless or zonal network endpoint groups, an “unhealthy” or “no backends” result is often endpoints registered in the wrong region or not registered at all. For managed instance groups, I watch for autohealing fighting the load balancer with a separate, conflicting health probe — the two can ping-pong instances in and out. AI is handy for reconciling the two health configurations and spotting the conflict.

The honest division of labor

AI is fast at the part that wastes the most human time: mapping a status code to the failing layer, and checking whether a health check actually matches what the backend serves. Those are deterministic relationships, which is why the model is reliable on them. What it can’t see is your live traffic shape or whether the remaining backends have headroom — so it tells me which layer is broken and I decide whether a capacity change is safe before I make it.

The rule I hold to: reason from the request path, and don’t recreate backends until the health check and firewall path are confirmed. The reusable version lives in my prompts library, and the GCP with AI series covers the layer underneath, including VPC firewall and routing debugging for when the probes can’t reach the backends at all. The load balancer’s errors are generic; the path that produced them is not.

Debugging GCP Load Balancers With AI: Backends and Health Checks