Immutable Infrastructure Patterns: Stop Patching, Start Replacing
Mutable servers drift, accumulate cruft, and fail unpredictably. Immutable infrastructure trades in-place changes for replacement — here's how to actually adopt it.
- #iac
- #immutable-infrastructure
- #deployment
- #golden-images
- #reliability
- #devops
The single most reliable production server I ever ran was the one nobody could SSH into. Not because it was locked down for security — though it was — but because there was no reason to. It was built from an image, deployed, and never modified. When it needed a change, we didn’t change it. We built a new image and replaced it.
That’s immutable infrastructure in one sentence: servers are never modified after deployment; they’re replaced. It sounds restrictive until you’ve spent a weekend chasing config drift on a snowflake box that “used to work.” This is how the pattern works, why it’s worth the discipline, and how to adopt it without rewriting everything.
The problem with mutable servers
A long-lived, mutable server accumulates entropy. Someone hotfixes a config at 2am and forgets to put it in Ansible. A package gets manually upgraded. A debug flag gets flipped and never flipped back. Six months later the server’s actual state is a mystery that exists in no version control anywhere — a snowflake. When it dies, you can’t rebuild it, because nobody knows what’s on it.
Config management tools (Ansible, Salt, Puppet) fight this by re-converging the server toward a declared state. That helps, but the server is still a moving target between converge runs, and anything outside the config management’s awareness drifts freely. Immutable infrastructure removes the target entirely.
The core pattern: bake, deploy, replace
Immutable infrastructure has three moves:
- Bake a complete machine image — OS, dependencies, app, config — as a versioned artifact. This is the golden image, typically built with Packer.
- Deploy instances from that exact image. No post-deploy configuration; the image is the configuration.
- Replace to change anything. New version? Bake a new image, roll it out, terminate the old instances. Never modify a running one.
# Packer template — the image IS the artifact
source "amazon-ebs" "app" {
ami_name = "app-${var.app_version}-${local.timestamp}"
instance_type = "t3.medium"
source_ami = var.base_ami
}
build {
sources = ["source.amazon-ebs.app"]
provisioner "shell" {
inline = [
"sudo apt-get update",
"sudo apt-get install -y app=${var.app_version}",
]
}
# After this, the image never changes. To update, rebuild.
}
Every image is tagged with a version. What you tested is byte-identical to what runs. There is no drift, because there’s no opportunity for it.
Blue-green and rolling replacement
“Replace instead of modify” is only safe if replacement is safe. Two patterns make it so:
- Rolling replacement — bring up new-image instances, health-check them, drain and terminate old ones a few at a time. Most auto-scaling groups do this natively (an instance refresh).
- Blue-green — stand up a complete new fleet (green) from the new image alongside the old (blue), shift traffic at the load balancer, keep blue around for instant rollback.
# AWS ASG instance refresh — replace all instances with the new image
InstanceRefresh:
Strategy: Rolling
Preferences:
MinHealthyPercentage: 90
InstanceWarmup: 120 # seconds to let new instances warm up
Rollback becomes trivial: point the ASG or LB back at the previous image/fleet. No “undo the patch” — just deploy the old artifact again. This is the operational payoff that makes the discipline worth it.
State is the hard part
The objection everyone raises: “my server has data on it.” Correct — and that’s the whole design constraint. Immutable infrastructure forces a clean separation:
- Stateless tier (web, app, workers) → fully immutable. Replace freely.
- State (databases, user uploads, caches) → externalized to managed services, network storage, or object stores that persist across instance replacement.
If your app writes important data to local disk on a stateless node, you can’t be immutable until you fix that. This is often the real work of adopting the pattern, and it’s worth doing regardless — a node you can destroy without data loss is a node you can scale, patch, and recover trivially.
Security and ops side effects
Going immutable hands you benefits you didn’t ask for:
- No SSH in production. If nobody modifies servers, nobody needs shell access. Disable it. Your attack surface and your “who changed this?” mystery both shrink dramatically.
- Patching is a rebuild. A CVE in the base image? Rebuild the golden image with the patch, roll it out. No per-server
apt upgrade, no half-patched fleet. - Trivial horizontal scaling. Every instance is identical and disposable, so adding capacity is just launching more of the same image.
- Reproducible incident recovery. A bad node gets terminated and replaced automatically. “Turn it off and on again” becomes the actual, blessed remediation.
Where AI fits
The authoring work in immutable infrastructure is the image-build pipeline and the replacement orchestration — Packer templates, ASG/instance-refresh configs, blue-green traffic-shift scripts. I use an assistant to scaffold Packer builds from a dependency list, generate the instance-refresh or blue-green config for a given platform, and review a deploy script for the classic “terminated old fleet before health-checking new one” mistake. Keep a few golden image and deployment prompts handy, then test the full bake-deploy-replace loop in staging before trusting it in prod.
Adopting it without a big bang
You don’t rewrite everything Friday afternoon. Stage it:
- Pick one stateless service. Get its data off local disk if it isn’t already.
- Build a golden image for it with Packer; deploy from the image.
- Wire up rolling replacement via your ASG or orchestrator.
- Disable SSH to that fleet and prove you don’t need it.
- Make “rebuild the image” the only way to change it — and hold the line.
Once one service runs this way and you’ve recovered from a bad deploy by re-pointing at the old image, the pattern sells itself. The goal isn’t dogma; it’s eliminating the snowflake — the server whose state lives only in its own RAM and nobody’s git history. Replace, don’t patch, and that whole category of 2am mystery disappears. For more, see our Infrastructure as Code guides.
Generated image pipelines and deployment configs are assistive, not authoritative. Validate the full bake-deploy-replace cycle in staging, including rollback, before relying on it in production.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.