Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Terraform By James Joyner IV · · 9 min read

Building Continuous Terraform Drift Detection Into Your Pipeline

Catching drift once it's caused an outage is too late. Here's how to run scheduled drift detection that surfaces out-of-band changes before they bite you.

  • #terraform
  • #drift-detection
  • #ci
  • #automation
  • #monitoring
  • #gitops

Drift is the gap between what your Terraform code says should exist and what actually exists in the cloud. It creeps in constantly: someone toggles a setting in the console during an incident, an autoscaler rewrites a value, a different tool changes a tag, a manual “quick fix” never makes it back to code. By the time you find out — usually because your next apply wants to undo someone’s emergency change, or revert a fix you forgot about — it’s already a problem.

The cure isn’t heroics. It’s making drift visible continuously, so you learn about an out-of-band change the day it happens, not the day it breaks your deploy. Here’s how to build that into a pipeline.

What drift detection actually is

At its core, detecting drift is one command:

terraform plan -detailed-exitcode

The -detailed-exitcode flag is the whole trick. It changes the exit codes to:

  • 0 — no changes. Reality matches code.
  • 1 — error.
  • 2 — there are changes. Reality has drifted from code.

A clean periodic run that exits 0 means you’re in sync. An exit 2 on a config nobody touched means something changed the world out from under your code. That’s your signal.

The naive version, and why it’s not enough

You could throw terraform plan -detailed-exitcode in a cron job and alert on exit 2. That’s a real start, but it has sharp edges that’ll make people ignore the alerts within a week:

  • A drift run that needs an apply lock can collide with a real deploy.
  • Refresh against a live API can itself surface transient noise.
  • A raw “there are changes” alert with no detail is unactionable — was it a security group opening to the world, or a tag casing difference?

A drift system people actually trust needs to be scheduled safely, produce specific findings, and route them somewhere humans will see them.

A scheduled drift workflow

Here’s the shape of a GitHub Actions workflow that runs nightly across your configs and reports specifics:

name: drift-detection
on:
  schedule:
    - cron: "0 6 * * *"   # 06:00 UTC daily
  workflow_dispatch: {}

jobs:
  detect:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        stack: [networking, platform, app]
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Init
        working-directory: stacks/${{ matrix.stack }}
        run: terraform init -lockfile=readonly

      - name: Detect drift
        id: plan
        working-directory: stacks/${{ matrix.stack }}
        run: |
          set +e
          terraform plan -refresh-only -detailed-exitcode -no-color -out=drift.plan
          echo "exitcode=$?" >> "$GITHUB_OUTPUT"
          set -e

      - name: Report drift
        if: steps.plan.outputs.exitcode == '2'
        working-directory: stacks/${{ matrix.stack }}
        run: |
          terraform show -no-color drift.plan > drift.txt
          # post drift.txt to Slack / open an issue / page, as appropriate

Two deliberate choices in there:

  • -refresh-only focuses the run on “has reality diverged?” rather than “what would my code change?” It’s the cleanest way to ask the drift question without conflating it with pending code changes you haven’t merged yet.
  • The matrix over stacks means each configuration reports independently, so a drift finding points at which stack drifted, not a single opaque pass/fail.

Check blocks turn drift detection into health checks

A powerful upgrade: pair drift detection with check blocks. Because checks evaluate on every plan — including your -refresh-only runs — they let you assert health, not just configuration sameness:

check "cert_not_expiring" {
  assert {
    condition     = timecmp(plantimestamp(), aws_acm_certificate.main.not_after) < 0
    error_message = "Certificate expired or expiring."
  }
}

Now your nightly run isn’t only catching “someone changed a setting” — it’s surfacing “this cert is about to expire” and “this endpoint stopped returning 200.” The drift pipeline doubles as a cheap monitoring layer that lives next to the infrastructure it watches. (For more on these, we have a dedicated guide on check blocks and assertions in the Terraform category.)

Reduce the noise or people will mute it

The fastest way to kill a drift program is alert fatigue. A few tactics that keep findings meaningful:

  • Ignore the values you know change out of band. If an autoscaler legitimately rewrites desired_count, put it in lifecycle { ignore_changes = [desired_count] } so it stops showing up as drift forever.
  • Separate “real drift” from “code ahead of reality.” Use -refresh-only for the drift job specifically, so unmerged code changes don’t pollute the signal.
  • Make findings actionable. Post the actual terraform show output, not just “drift detected.” The person who gets paged should see what drifted in the alert itself.
  • Triage, then decide reconcile vs codify. Each finding is a fork: either reality is wrong (apply your code to fix it) or your code is stale (update code to match an intentional change). Both are valid; the point is to decide, not let it linger.

Closing the loop

Detection without a response process is just a noisier ignore. When a drift alert fires, the workflow should be: someone looks at the specific finding, decides whether to reconcile (re-apply code) or codify (update code to match an intentional manual change), and does it that day. The whole value is shrinking the window between “the world changed” and “we know and have a plan.”

Drift findings often arrive as a wall of plan output at 6am, and deciding fast whether a change is benign or dangerous is exactly where a structured review helps — a security group that gained an ingress rule deserves very different urgency than a re-ordered tag. Running drift output through a code review workflow helps separate the cosmetic from the alarming. For more on keeping infrastructure honest, see our other Terraform guides.

Drift is inevitable in any real environment with more than one human and more than one tool. The teams that stay calm aren’t the ones who prevent all drift — they’re the ones who see it the same day it happens.

Drift detection commands and flags behave differently across Terraform and OpenTofu versions. Verify against your tooling and test the workflow on a non-critical stack first.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.