Autoscaling Clusters with OpenStack Senlin
Senlin manages homogeneous clusters of nodes with policies for scaling, health, and load balancing. Here's how I use it for real autoscaling on OpenStack.
- #openstack
- #senlin
- #autoscaling
- #clustering
- #heat
- #devops
People come to OpenStack from AWS expecting an Auto Scaling Group and a target-tracking policy, and they’re surprised when Nova doesn’t have one. Heat can scale a stack, but Heat’s scaling is clunky and stateful in ways that hurt. The service actually built for “keep N healthy nodes and scale them on a signal” is Senlin.
I’ve used Senlin to run autoscaling fleets on OpenStack for years. It’s underrated and underused, partly because people reach for Heat autoscaling first and get burned. Here’s how I run Senlin properly.
The Senlin model: profiles, clusters, policies
Three concepts and you understand Senlin:
- A profile describes a single node — usually a Nova server spec (image, flavor, networks, key). It’s the template every node is stamped from.
- A cluster is a managed group of nodes built from one profile, with a desired/min/max capacity.
- A policy attaches a behavior to a cluster — scaling, health, load-balancing, deletion order, affinity.
The power is in policies. A cluster with a health policy self-heals dead nodes. A cluster with a scaling policy responds to signals. A cluster with an lb policy registers and deregisters nodes from an Octavia pool automatically. You compose these.
Building a cluster
Define a profile, then a cluster from it:
# web-profile.yaml
type: os.nova.server
version: 1.0
properties:
flavor: m1.small
image: ubuntu-22.04
key_name: ops-key
networks:
- network: tenant-net
# Create the profile
openstack cluster profile create --spec-file web-profile.yaml web-profile
# Create a cluster with desired/min/max
openstack cluster create \
--profile web-profile \
--desired-capacity 3 \
--min-size 2 \
--max-size 10 \
web-cluster
openstack cluster show web-cluster
Senlin immediately reconciles to the desired capacity by booting nodes. From here, every change is a policy or an explicit resize.
Health policy: self-healing for free
The first policy I attach to any production cluster is health. It polls node status and rebuilds or recreates nodes that go unhealthy:
# health-policy.yaml
type: senlin.policy.health
version: 1.1
properties:
detection:
interval: 60
detection_modes:
- type: NODE_STATUS_POLLING
recovery:
actions:
- name: RECREATE
openstack cluster policy create --spec-file health-policy.yaml health-pol
openstack cluster policy attach --policy health-pol web-cluster
Now a node that Nova reports as ERROR or that vanishes gets recreated automatically. This alone makes Senlin worth running — it’s the self-healing Nova doesn’t give you natively.
Scaling policies and real autoscaling
A scaling policy defines how much to scale per signal:
type: senlin.policy.scaling
version: 1.0
properties:
event: CLUSTER_SCALE_OUT
adjustment:
type: CHANGE_IN_CAPACITY
number: 2
min_step: 1
cooldown: 120
The signal that triggers scaling comes from outside Senlin — typically Aodh alarms on Ceilometer/Gnocchi metrics calling Senlin’s webhook (a receiver):
# Create a webhook receiver that triggers scale-out
openstack cluster receiver create \
--type webhook \
--cluster web-cluster \
--action CLUSTER_SCALE_OUT \
scale-out-hook
Point an Aodh alarm at the receiver’s URL when CPU crosses a threshold, and you have closed-loop autoscaling: metric -> alarm -> webhook -> Senlin scales. The cooldown is the single most important knob — too short and you flap, adding and removing nodes faster than they can warm up.
Wiring in Octavia
For a web fleet, attach an lb policy so new nodes auto-register in your load balancer pool and removed nodes deregister cleanly:
type: senlin.policy.loadbalance
version: 1.1
properties:
pool:
protocol: HTTP
protocol_port: 80
health_monitor:
type: HTTP
url_path: /healthz
This closes the last gap — scaling that doesn’t update the load balancer is useless. With the lb policy, scale-out adds a backend and scale-in drains and removes one.
The failure modes I watch for
- Flapping. Cooldowns too short, or scale-out and scale-in alarms with overlapping thresholds. Leave a dead band between scale-out and scale-in triggers.
- Scale-in killing the wrong node. Attach a deletion policy so scale-in removes the oldest or least-loaded node, not a random one mid-request.
- Stuck actions. Senlin actions queue; a wedged action blocks the cluster. Check
openstack cluster action listwhen a cluster stops responding to resizes.
I keep an AI prompt that takes a cluster’s action history plus the attached policies and tells me whether a flapping cluster is a cooldown problem or an overlapping-threshold problem — it disentangles the two faster than I do by eye. A few of these are in our prompt library.
Senlin vs Heat autoscaling
My rule: use Senlin for anything that needs to scale on load, self-heal, and integrate with a load balancer. Use Heat for orchestrating the surrounding infrastructure and let Heat reference the Senlin cluster as a resource. Heat’s own AutoScalingGroup works but couples scaling to stack updates, which gets painful. Senlin keeps the cluster lifecycle independent.
Where to go next
Senlin is the autoscaling primitive OpenStack should be famous for. Start with a health policy for self-healing, add a scaling policy driven by Aodh webhooks, and wire in an lb policy so your load balancer stays accurate. Mind the cooldowns and the deletion order and it runs itself. For the Octavia and Heat services it integrates with, see the OpenStack category.
Autoscaling configurations can scale costs as fast as capacity. Validate cooldowns and min/max bounds against your own load patterns before going live.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.