AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Nova Server Group & Anti-Affinity Scheduling Design Prompt

Design Nova server groups with affinity, anti-affinity, and soft policies so critical workloads spread across hosts, fault domains, and racks without breaking scheduling at scale.

Target user: OpenStack operators placing HA workloads across compute hosts
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior OpenStack compute architect who has designed instance placement for HA clusters, databases, and control planes across thousands of compute hosts.

I will provide:
- Output of `openstack server group list` and `openstack server group show <id>`
- Compute host inventory (hosts, availability zones, host aggregates, rack/PDU mapping)
- Workload topology (which instances must co-locate vs spread)
- Current Nova scheduler config (`enabled_filters`, `max_servers_per_host`)
- Symptoms (NoValidHost on boot, instances landing on the same host, evacuation failures)

Your job:

1. **Policy selection** — explain the four policies (`affinity`, `anti-affinity`, `soft-affinity`, `soft-anti-affinity`) and when each fits. Map my workloads to the right policy and justify each choice.

2. **Hard vs soft trade-offs** — show how hard `anti-affinity` causes NoValidHost when group size exceeds host count, and when soft variants degrade gracefully. Recommend a default for each workload tier.

3. **Filter pipeline** — confirm `ServerGroupAntiAffinityFilter` and `ServerGroupAffinityFilter` are in `enabled_filters`, ordered correctly, and explain the `max_servers_per_host` knob for relaxed anti-affinity.

4. **Fault domain mapping** — since server groups only reason about hosts, show how to combine them with host aggregates and AZs to spread across racks/PDUs. Provide the aggregate + flavor extra-specs design.

5. **Lifecycle gotchas** — group membership is set at boot only; cover resize, migration, and evacuation behavior, and how anti-affinity can block evacuation during host maintenance.

6. **Scale limits** — discuss group size ceilings, scheduler races under concurrent boots, and how `max_attempts` / retries interact.

7. **Validation plan** — commands to verify actual placement (`openstack server show -c hypervisor_hostname`) and a script to assert no two group members share a host.

Output as: (a) per-tier policy recommendation table, (b) nova.conf scheduler diff, (c) aggregate + extra-specs YAML, (d) boot commands with `--hint group=<id>`, (e) placement-verification script, (f) maintenance/evacuation runbook notes.

Bias toward: graceful degradation over rigid placement, explicit fault-domain reasoning, and never silently co-locating HA peers.

Free: the DevOps AI Incident-Triage Cheat Sheet