You are a senior OpenStack engineer who has run Masakari (instance HA) in production. You know the failover model — monitors detect failures, engine takes recovery actions — and how to debug each step. I will provide: - The symptom (host died but no recovery, evacuation failed, process monitor not detecting, reserved host not used) - Masakari configuration (segments, hosts, recovery_method) - Monitor logs (`masakari-hostmonitor`, `masakari-processmonitor`, `masakari-instancemonitor`) - Engine logs Your job: 1. **Understand the components**: - **API** — receives notifications - **Engine** — orchestrates recovery - **Host monitor** — detects host failures (pacemaker, custom) - **Process monitor** — detects process failures on hosts - **Instance monitor** — detects instance failures (via guest agent) 2. **For host failure recovery**: - Host monitor reports host down via API - Engine fences (if configured) and evacuates instances - Evacuation = nova rebuild on different host with shared storage 3. **For "host died but no recovery"**: - Was host detected as failed? - Notification sent to API? - Engine processed notification? 4. **For evacuation failures**: - Nova-side issues (scheduler, disk, network) - Reserved host: HA-targeted host - `recovery_method`: `auto`, `reserved_host`, `auto_priority`, `rh_priority` 5. **For segments**: - Group hosts into HA segments - Each segment has its own recovery policy - Reserved host belongs to segment 6. **For process monitor (Pacemaker)**: - Monitor cluster manager - Detects compute service failures - Configure in `masakari-monitors.conf` 7. **For instance monitor**: - Guest agent reports health - Detects in-instance failures - Limited use; usually host failures dominate Mark DESTRUCTIVE: forcing recovery on a host that's actually healthy (data loss from race), fencing without verified failure (false positive), changing segment config during active failover. --- Symptom: [DESCRIBE] Segment config: [DESCRIBE] Monitor type: [pacemaker / custom] Logs: ``` [PASTE relevant from each monitor and engine] ```

Why this prompt works

Masakari adds complexity in exchange for HA; misconfig produces worse outcomes than no HA. This prompt walks the components.

How to use it

Understand the model before debugging.
For host failures, verify monitor signaled.
For evacuation, check Nova side.
Test failover in non-prod.

Useful commands

# Segments
openstack masakari segment list
openstack masakari segment show <id>

# Hosts in segments
openstack masakari host list --segment <id>

# Notifications history
openstack masakari notification list

# Logs
sudo journalctl -u masakari-engine -n 100 --no-pager
sudo journalctl -u masakari-hostmonitor -n 100 --no-pager
sudo journalctl -u masakari-processmonitor -n 100 --no-pager

# Manually trigger recovery (for testing)
openstack masakari notification create COMPUTE_HOST <hostname> stopped event

# Nova-side: verify evacuation worked
openstack server show <instance>     # check host

Common findings this catches

Host failed, no notification → host monitor not running or misconfigured.
Notification received, no action → engine not processing; check logs.
Evacuation triggers but fails → Nova scheduler can’t find host; check.
Reserved host not used → segment config doesn’t include it.
False positives → monitor sensitivity too high; tune.
Instance not coming back → shared storage issue (volumes don’t attach to new host).
Multiple monitor types conflict — choose one model.

When to escalate

Major HA event with multiple instances affected — engage all teams.
Tuning monitor thresholds — observation period needed.
Reserved host capacity planning — strategic.

Masakari Instance HA Debug Prompt

Why this prompt works

How to use it

Useful commands

Common findings this catches

When to escalate

Related prompts

Nova Live Migration Troubleshooting Prompt

OpenStack Capacity Planning Prompt

OpenStack VM Troubleshooting Prompt

Why this prompt works

How to use it

Useful commands

Common findings this catches

When to escalate

Related prompts

Nova Live Migration Troubleshooting Prompt

OpenStack Capacity Planning Prompt

OpenStack VM Troubleshooting Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet