Building Continuous Terraform Drift Detection Into Your Pipeline
Catching drift once it's caused an outage is too late. Here's how to run scheduled drift detection that surfaces out-of-band changes before they bite you.
- #terraform
- #drift-detection
- #ci
- #automation
- #monitoring
- #gitops
Drift is the gap between what your Terraform code says should exist and what actually exists in the cloud. It creeps in constantly: someone toggles a setting in the console during an incident, an autoscaler rewrites a value, a different tool changes a tag, a manual “quick fix” never makes it back to code. By the time you find out — usually because your next apply wants to undo someone’s emergency change, or revert a fix you forgot about — it’s already a problem.
The cure isn’t heroics. It’s making drift visible continuously, so you learn about an out-of-band change the day it happens, not the day it breaks your deploy. Here’s how to build that into a pipeline.
What drift detection actually is
At its core, detecting drift is one command:
terraform plan -detailed-exitcode
The -detailed-exitcode flag is the whole trick. It changes the exit codes to:
0— no changes. Reality matches code.1— error.2— there are changes. Reality has drifted from code.
A clean periodic run that exits 0 means you’re in sync. An exit 2 on a config nobody touched means something changed the world out from under your code. That’s your signal.
The naive version, and why it’s not enough
You could throw terraform plan -detailed-exitcode in a cron job and alert on exit 2. That’s a real start, but it has sharp edges that’ll make people ignore the alerts within a week:
- A drift run that needs an apply lock can collide with a real deploy.
- Refresh against a live API can itself surface transient noise.
- A raw “there are changes” alert with no detail is unactionable — was it a security group opening to the world, or a tag casing difference?
A drift system people actually trust needs to be scheduled safely, produce specific findings, and route them somewhere humans will see them.
A scheduled drift workflow
Here’s the shape of a GitHub Actions workflow that runs nightly across your configs and reports specifics:
name: drift-detection
on:
schedule:
- cron: "0 6 * * *" # 06:00 UTC daily
workflow_dispatch: {}
jobs:
detect:
runs-on: ubuntu-latest
strategy:
matrix:
stack: [networking, platform, app]
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Init
working-directory: stacks/${{ matrix.stack }}
run: terraform init -lockfile=readonly
- name: Detect drift
id: plan
working-directory: stacks/${{ matrix.stack }}
run: |
set +e
terraform plan -refresh-only -detailed-exitcode -no-color -out=drift.plan
echo "exitcode=$?" >> "$GITHUB_OUTPUT"
set -e
- name: Report drift
if: steps.plan.outputs.exitcode == '2'
working-directory: stacks/${{ matrix.stack }}
run: |
terraform show -no-color drift.plan > drift.txt
# post drift.txt to Slack / open an issue / page, as appropriate
Two deliberate choices in there:
-refresh-onlyfocuses the run on “has reality diverged?” rather than “what would my code change?” It’s the cleanest way to ask the drift question without conflating it with pending code changes you haven’t merged yet.- The matrix over stacks means each configuration reports independently, so a drift finding points at which stack drifted, not a single opaque pass/fail.
Check blocks turn drift detection into health checks
A powerful upgrade: pair drift detection with check blocks. Because checks evaluate on every plan — including your -refresh-only runs — they let you assert health, not just configuration sameness:
check "cert_not_expiring" {
assert {
condition = timecmp(plantimestamp(), aws_acm_certificate.main.not_after) < 0
error_message = "Certificate expired or expiring."
}
}
Now your nightly run isn’t only catching “someone changed a setting” — it’s surfacing “this cert is about to expire” and “this endpoint stopped returning 200.” The drift pipeline doubles as a cheap monitoring layer that lives next to the infrastructure it watches. (For more on these, we have a dedicated guide on check blocks and assertions in the Terraform category.)
Reduce the noise or people will mute it
The fastest way to kill a drift program is alert fatigue. A few tactics that keep findings meaningful:
- Ignore the values you know change out of band. If an autoscaler legitimately rewrites
desired_count, put it inlifecycle { ignore_changes = [desired_count] }so it stops showing up as drift forever. - Separate “real drift” from “code ahead of reality.” Use
-refresh-onlyfor the drift job specifically, so unmerged code changes don’t pollute the signal. - Make findings actionable. Post the actual
terraform showoutput, not just “drift detected.” The person who gets paged should see what drifted in the alert itself. - Triage, then decide reconcile vs codify. Each finding is a fork: either reality is wrong (apply your code to fix it) or your code is stale (update code to match an intentional change). Both are valid; the point is to decide, not let it linger.
Closing the loop
Detection without a response process is just a noisier ignore. When a drift alert fires, the workflow should be: someone looks at the specific finding, decides whether to reconcile (re-apply code) or codify (update code to match an intentional manual change), and does it that day. The whole value is shrinking the window between “the world changed” and “we know and have a plan.”
Drift findings often arrive as a wall of plan output at 6am, and deciding fast whether a change is benign or dangerous is exactly where a structured review helps — a security group that gained an ingress rule deserves very different urgency than a re-ordered tag. Running drift output through a code review workflow helps separate the cosmetic from the alarming. For more on keeping infrastructure honest, see our other Terraform guides.
Drift is inevitable in any real environment with more than one human and more than one tool. The teams that stay calm aren’t the ones who prevent all drift — they’re the ones who see it the same day it happens.
Drift detection commands and flags behave differently across Terraform and OpenTofu versions. Verify against your tooling and test the workflow on a non-critical stack first.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.