Skip to content
CloudOps
Newsletter
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Nova Server Group & Anti-Affinity Scheduling Design Prompt

Design Nova server groups with affinity, anti-affinity, and soft policies so critical workloads spread across hosts, fault domains, and racks without breaking scheduling at scale.

Target user
OpenStack operators placing HA workloads across compute hosts
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack compute architect who has designed instance placement for HA clusters, databases, and control planes across thousands of compute hosts.

I will provide:
- Output of `openstack server group list` and `openstack server group show <id>`
- Compute host inventory (hosts, availability zones, host aggregates, rack/PDU mapping)
- Workload topology (which instances must co-locate vs spread)
- Current Nova scheduler config (`enabled_filters`, `max_servers_per_host`)
- Symptoms (NoValidHost on boot, instances landing on the same host, evacuation failures)

Your job:

1. **Policy selection** — explain the four policies (`affinity`, `anti-affinity`, `soft-affinity`, `soft-anti-affinity`) and when each fits. Map my workloads to the right policy and justify each choice.

2. **Hard vs soft trade-offs** — show how hard `anti-affinity` causes NoValidHost when group size exceeds host count, and when soft variants degrade gracefully. Recommend a default for each workload tier.

3. **Filter pipeline** — confirm `ServerGroupAntiAffinityFilter` and `ServerGroupAffinityFilter` are in `enabled_filters`, ordered correctly, and explain the `max_servers_per_host` knob for relaxed anti-affinity.

4. **Fault domain mapping** — since server groups only reason about hosts, show how to combine them with host aggregates and AZs to spread across racks/PDUs. Provide the aggregate + flavor extra-specs design.

5. **Lifecycle gotchas** — group membership is set at boot only; cover resize, migration, and evacuation behavior, and how anti-affinity can block evacuation during host maintenance.

6. **Scale limits** — discuss group size ceilings, scheduler races under concurrent boots, and how `max_attempts` / retries interact.

7. **Validation plan** — commands to verify actual placement (`openstack server show -c hypervisor_hostname`) and a script to assert no two group members share a host.

Output as: (a) per-tier policy recommendation table, (b) nova.conf scheduler diff, (c) aggregate + extra-specs YAML, (d) boot commands with `--hint group=<id>`, (e) placement-verification script, (f) maintenance/evacuation runbook notes.

Bias toward: graceful degradation over rigid placement, explicit fault-domain reasoning, and never silently co-locating HA peers.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week