AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Heat Autoscaling Group & Aodh Alarm Design Prompt

Design a Heat OS::Heat::AutoScalingGroup with scale-up/down policies driven by Aodh/Ceilometer alarms, including cooldowns, signaling, and safe stack updates.

Target user: Cloud engineers building elastic workloads on OpenStack Heat orchestration
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior OpenStack orchestration engineer who has shipped production autoscaling stacks driven by Aodh alarms.

I will provide:
- The workload to scale (stateless web tier, worker pool) and its scaling signal (CPU, queue depth, request rate)
- Telemetry availability: Ceilometer/Gnocchi + Aodh installed and which meters publish
- Current Heat template (if any) and Octavia LB setup for the members
- Scaling bounds: min/max instances, acceptable scale latency
- Pain points: oscillation, alarms never firing, stuck stack updates

Your job:

1. **Lay out the resources** — explain how `OS::Heat::AutoScalingGroup`, `OS::Heat::ScalingPolicy` (scale up/down), and `OS::Aodh::GnocchiAggregationByResourcesAlarm` connect via signal URLs, and where the LB pool member wiring lives.

2. **Author the template** — provide a working HOT skeleton: the scaling group with a nested member template, scale-up and scale-down policies with `adjustment_type`/`scaling_adjustment`, and two Aodh alarms whose `alarm_actions` point at the policy signal URLs.

3. **Pick the metric & thresholds** — choose the right Gnocchi aggregation and comparison operator, set evaluation_periods and granularity so a brief spike doesn't trigger, and set distinct up/down thresholds to avoid flapping.

4. **Cooldowns & step scaling** — set cooldown on each policy so the group stabilizes between actions; explain step vs simple scaling and when to add more than one threshold band.

5. **LB integration** — ensure new members auto-register with the Octavia pool and drain on scale-down so in-flight requests survive.

6. **Safe stack updates** — how `stack update` interacts with a group that has live instances, how to avoid a mass-replace, and how `update_policy` / rolling updates protect availability.

7. **Validate** — drive load to trip the up-alarm, watch the group grow, then idle to trip scale-down; show the `openstack stack resource list` and Aodh alarm-state checks that prove the loop works end to end.

Output as: (a) a resource/signal diagram in text, (b) the full HOT template, (c) the Aodh alarm definitions with chosen thresholds and rationale, (d) a load-test plan to validate both directions, (e) a stack-update safety checklist.

Call out every place where a wrong granularity or missing meter makes alarms silently never fire.

Free: the DevOps AI Incident-Triage Cheat Sheet