Skip to content
DevOps AI ToolKit
Newsletter
All guides
GCP with AI By James Joyner IV · · 10 min read

GCP Error Guide: '502 Bad Gateway' Load Balancer Backend Unhealthy

Fix GCP load balancer 502 / backend unhealthy errors: diagnose failing health checks, firewall rules, wrong ports, backend timeouts, and NEG misconfiguration.

  • #gcp
  • #troubleshooting
  • #errors
  • #networking

Overview

A 502 Bad Gateway from a Google Cloud HTTP(S) load balancer means the load balancer accepted the client request but could not get a valid response from a healthy backend. Either no backend instances are passing health checks (so the backend service has no serving capacity), or the chosen backend closed the connection or timed out before responding. The load balancer returns 502 to the client rather than forwarding a broken response.

You will see this from the client and in the load-balancer logs:

HTTP/1.1 502 Bad Gateway
Server: Google Frontend

And in the Cloud Logging entry for the request:

jsonPayload.statusDetails: "failed_to_pick_backend"
httpRequest.status: 502

Other common statusDetails values are backend_connection_closed_before_data_sent_to_client and backend_timeout. It occurs on external/internal Application Load Balancers in front of MIGs, NEGs (including serverless and GKE), or hybrid backends — any time the backend pool can’t serve the request.

Symptoms

  • Clients get 502 Bad Gateway with Server: Google Frontend.
  • Backend service shows 0 healthy instances; health checks are red.
  • LB logs show statusDetails: failed_to_pick_backend, backend_timeout, or backend_connection_closed_before_data_sent_to_client.
  • Backends are actually running and serving on the VM, but the LB still marks them unhealthy.
gcloud compute backend-services get-health api-backend --global \
  --format="table(status.healthStatus[].instance.basename(), status.healthStatus[].healthState)"
INSTANCE   HEALTH_STATE
web-001    UNHEALTHY
web-002    UNHEALTHY

Common Root Causes

1. Health check failing (wrong path/port)

The health check probes a path or port the backend doesn’t serve, so every instance is marked unhealthy and the pool empties.

gcloud compute health-checks describe api-hc \
  --format="value(httpHealthCheck.requestPath, httpHealthCheck.port)"
/healthz  8080

If the app serves health at / (or on a different port), the probe gets a non-200 and the backend stays UNHEALTHY.

2. Firewall doesn’t allow the health-check probes

GCP health checks originate from fixed probe ranges (35.191.0.0/16, 130.211.0.0/22). If no firewall rule allows them to the backend port, probes never reach the instances.

gcloud compute firewall-rules list \
  --filter="sourceRanges:35.191.0.0/16 OR sourceRanges:130.211.0.0/22" \
  --format="table(name, allowed[].map().firewall_rule().list(), sourceRanges.list())"
Listed 0 items.

No rule allowing the probe ranges means health checks time out and all backends look unhealthy.

3. Backend serving on a different port than the LB expects

The backend service / named port doesn’t match the port the application actually listens on.

gcloud compute backend-services describe api-backend --global \
  --format="value(port, portName)"
gcloud compute instance-groups managed describe api-mig --zone us-central1-a \
  --format="value(namedPorts)"
80  http
{'name': 'http', 'port': 8080}

The named port maps http→8080, but if the app listens on 9000, both health checks and traffic reach a dead port.

4. Backend timeout too low / slow backend

If the backend takes longer than the backend service’s response timeout, the LB aborts with backend_timeout.

gcloud compute backend-services describe api-backend --global \
  --format="value(timeoutSec)"
5

A 5-second timeout against a backend whose slow endpoints take ~8s yields 502s with statusDetails: backend_timeout.

5. Backend closes the connection prematurely

Mismatched keepalive timeouts (backend’s idle timeout shorter than the LB’s) cause the backend to drop connections the LB still considers open.

gcloud logging read \
  'resource.type="http_load_balancer" AND jsonPayload.statusDetails="backend_connection_closed_before_data_sent_to_client"' \
  --project my-prod-project --limit 1 --format="value(httpRequest.status)"
502

This statusDetails indicates the backend hung up early — raise the backend’s keepalive above the LB’s (default 600s for external ALB).

6. Serverless/GKE NEG misconfigured

A serverless NEG pointing at the wrong Cloud Run service/region, or a GKE NEG out of sync with pods, leaves the backend with no reachable endpoints.

gcloud compute network-endpoint-groups list \
  --format="table(name, networkEndpointType, size)"
NAME          NETWORK_ENDPOINT_TYPE  SIZE
run-neg       SERVERLESS             0

A serverless NEG of SIZE 0 (or pointing at a non-existent service) gives the LB nothing to route to.

Diagnostic Workflow

Step 1: Read the LB log statusDetails

gcloud logging read \
  'resource.type="http_load_balancer" AND httpRequest.status=502' \
  --project my-prod-project --limit 5 \
  --format="table(timestamp, jsonPayload.statusDetails)"

failed_to_pick_backend → no healthy backends (chase health checks). backend_timeout/backend_connection_closed_before_data_sent_to_client → the backend itself.

Step 2: Check backend health

gcloud compute backend-services get-health <BACKEND_SERVICE> --global \
  --format="table(status.healthStatus[].instance.basename(), status.healthStatus[].healthState)"

All UNHEALTHY confirms a health-check/firewall/port problem.

Step 3: Verify the health check matches the app

gcloud compute health-checks describe <HEALTH_CHECK> \
  --format="value(httpHealthCheck.requestPath, httpHealthCheck.port, type)"

Confirm the path returns 200 and the port matches the app. Test from a same-VPC VM:

curl -sS -o /dev/null -w '%{http_code}\n' http://<INSTANCE_IP>:8080/healthz

Step 4: Confirm firewall allows probe ranges

gcloud compute firewall-rules create allow-health-checks \
  --network=<NETWORK> --direction=INGRESS --action=ALLOW \
  --rules=tcp:8080 --source-ranges=35.191.0.0/16,130.211.0.0/22

Create/verify a rule permitting 35.191.0.0/16 and 130.211.0.0/22 to the backend port.

Step 5: Fix ports/timeouts and verify recovery

# Align named port + backend timeout
gcloud compute instance-groups managed set-named-ports api-mig \
  --named-ports=http:8080 --zone us-central1-a
gcloud compute backend-services update api-backend --global --timeout=30

Re-check get-health until instances flip to HEALTHY and 502s clear.

Example Root Cause Analysis

After moving an app from port 8080 to 9090, clients start getting 502 Bad Gateway. The LB logs show:

gcloud logging read \
  'resource.type="http_load_balancer" AND httpRequest.status=502' \
  --project my-prod-project --limit 3 \
  --format="table(jsonPayload.statusDetails)"
STATUSDETAILS
failed_to_pick_backend
failed_to_pick_backend

failed_to_pick_backend means no healthy backends. Health is all red:

gcloud compute backend-services get-health api-backend --global \
  --format="table(status.healthStatus[].healthState)"
HEALTH_STATE
UNHEALTHY
UNHEALTHY

The app now listens on 9090, but the named port and health check still point at 8080. The probe hits a dead port, every instance fails the check, and the pool empties. Curling the instance confirms 8080 is closed and 9090 serves.

Fix: update the named port and health-check port to 9090 (and the firewall rule):

gcloud compute instance-groups managed set-named-ports api-mig \
  --named-ports=http:9090 --zone us-central1-a
gcloud compute health-checks update http api-hc --port=9090

Instances flip to HEALTHY and the 502s stop.

Prevention Best Practices

  • Make the health-check path, port, and protocol exactly match what the app serves, and keep them in sync whenever the app’s port changes.
  • Always create a firewall rule allowing the health-check probe ranges (35.191.0.0/16, 130.211.0.0/22) to the backend port; missing this rule is a classic all-backends-unhealthy cause.
  • Set the backend service timeout above your slowest legitimate response, and set the backend’s keepalive timeout higher than the LB’s to avoid premature connection closes.
  • Keep named ports, backend service ports, and the application port aligned, and validate them in deploy automation.
  • For serverless/GKE NEGs, verify the NEG points at the correct service/region and has non-zero size before sending traffic.
  • For triage, the free incident assistant can map the LB statusDetails to the likely backend or health-check cause. More walkthroughs are in the GCP guides.

Quick Command Reference

# Why 502? statusDetails tells you backend vs health-check
gcloud logging read \
  'resource.type="http_load_balancer" AND httpRequest.status=502' \
  --project <PROJECT> --limit 5 --format="table(timestamp, jsonPayload.statusDetails)"

# Backend health
gcloud compute backend-services get-health <BACKEND_SERVICE> --global \
  --format="table(status.healthStatus[].instance.basename(), status.healthStatus[].healthState)"

# Health-check config
gcloud compute health-checks describe <HEALTH_CHECK> \
  --format="value(httpHealthCheck.requestPath, httpHealthCheck.port)"

# Firewall for probe ranges
gcloud compute firewall-rules create allow-health-checks --network=<NET> \
  --action=ALLOW --rules=tcp:<PORT> --source-ranges=35.191.0.0/16,130.211.0.0/22

# Align ports + timeout
gcloud compute instance-groups managed set-named-ports <MIG> \
  --named-ports=http:<PORT> --zone <ZONE>
gcloud compute backend-services update <BACKEND_SERVICE> --global --timeout=30

# NEG endpoints
gcloud compute network-endpoint-groups list \
  --format="table(name, networkEndpointType, size)"

Conclusion

A load balancer 502 Bad Gateway means the LB couldn’t get a valid response from a healthy backend. The usual root causes:

  1. The health check probes a path/port the backend doesn’t serve, emptying the pool.
  2. No firewall rule allows the health-check probe ranges to the backend.
  3. The backend serves on a different port than the LB/named port expects.
  4. The backend service timeout is shorter than the backend’s response time.
  5. The backend closes connections early due to a keepalive-timeout mismatch.
  6. A serverless/GKE NEG points at the wrong service or has no endpoints.

Read the LB log statusDetails first — failed_to_pick_backend sends you to health checks and firewalls, while backend_timeout/backend_connection_closed_... sends you to the backend itself.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.