OpenStack Troubleshooting Guides
Command-driven runbooks for the OpenStack failures that actually page you — written for engineers running production Kolla-Ansible clouds. Each guide walks the request path with copy-paste diagnostics, safe remediation, and validation, so you localize the fault instead of guessing. Two of them come with a free, print-ready runbook pack.
Guides
OpenStack 504 Gateway Timeout
504 Gateway Time-out (nginx/HAProxy) on Horizon or the OpenStack APIs
Horizon and API calls returning 504 through HAProxy — isolate the slow backend across Keystone, Nova, Cinder, Neutron, RabbitMQ, and MariaDB.
neutron-l3-agent Dead / XXX State
neutron agent-list shows the L3 agent as XXX (dead); floating IPs stop working
L3 agent showing XXX / dead in the agent list, routers down and floating IPs unreachable — heartbeat, RPC, and namespace triage.
Cinder Scheduler Timeout
MessagingTimeout in cinder-scheduler; "No valid backend" / filtering removed all hosts
Volumes stuck in creating, get-pools timing out, "filtering removed all hosts" — trace the API → scheduler → volume RPC path.
Kolla-Ansible Certificate Update
Expired or rotated TLS certs on the HAProxy VIP / internal + external endpoints
Rotate internal/external/HAProxy TLS certificates with kolla-ansible — backup, reconfigure the right services, validate, and roll back safely.
RabbitMQ RPC Timeout in OpenStack
MessagingTimeout / "missed heartbeats from client" across OpenStack services
oslo.messaging MessagingTimeout and missed heartbeats across Nova, Cinder, Neutron, and Heat — diagnose the queue, not just the service.
Free runbook packs
Print-ready incident runbooks — every command, an escalation workflow, and an incident notes template in one PDF. Drop your email and it downloads immediately. No account required.
OpenStack 504 Gateway Runbook Pack
A print-ready incident runbook for chasing 504s across HAProxy, Horizon, Keystone, Nova/Cinder/Neutron, RabbitMQ, and MariaDB.
- 504 triage checklist (top-to-bottom)
- HAProxy, Horizon, and Keystone checks
- Nova / Cinder / Neutron API checks
- RabbitMQ + MariaDB latency checks
- Kolla-Ansible container restart commands
- Escalation workflow + incident notes template
No account needed · single opt-in · we never share your email.
RabbitMQ RPC Timeout Runbook Pack
A copy/paste runbook for oslo.messaging timeouts and missed heartbeats — cluster health, queue depth, and a service restart decision tree.
- OpenStack RPC timeout checklist
- RabbitMQ cluster + queue depth commands
- Consumer / publisher + heartbeat checks
- oslo.messaging config review
- Nova/Cinder/Neutron/Heat symptom matrix
- Service restart decision tree + notes template
No account needed · single opt-in · we never share your email.
More for OpenStack engineers
- AI Prompt Library — copy-paste prompts for Nova, Neutron, Cinder, Keystone, and RabbitMQ.
- Free DevOps tools — validators and in-browser assistants.
- AI Incident Response — paste an error, get a triage plan.
- DevOps guides — the full OpenStack + RabbitMQ knowledge base.