Ephemeral Preview Environments That Don't Leak Cost

Ephemeral preview environments are one of those features that demo beautifully and rot quietly. The pitch is irresistible: open a pull request, and CI spins up a complete environment — service, database, queue — posts a preview URL to the PR, and tears it all down when the PR closes. Reviewers click a link instead of running things locally. Everyone loves it. Then a quarter later someone opens the cloud bill and finds forty orphaned databases, each born from a PR whose close-webhook quietly failed to fire.

That failure mode is not an edge case; it’s the default outcome of the naive design. This guide covers the two decisions that separate a preview-environment system you can trust from one that bleeds money: teardown that doesn’t depend on a webhook, and credentials that can’t reach production.

Never trust the close hook alone

The seductive design hangs teardown off the PR-close event. It works most of the time, which is exactly why it’s dangerous — it fails just rarely enough that nobody notices until the strays accumulate. Webhooks get dropped, jobs time out mid-destroy, a force-push confuses the state lookup, someone closes a PR while CI is down. Any of these leaves an environment running with no event left to clean it up.

The fix is a second, independent destruction path: a time-based reaper that runs on a schedule and destroys any environment older than a TTL, regardless of PR state. The close hook is the fast path; the reaper is the safety net. Tag every resource so the reaper can find them:

locals {
  pr_number = var.pr_number
  ttl_hours = 48
}

resource "aws_db_instance" "preview" {
  identifier        = "preview-pr-${local.pr_number}"
  instance_class    = "db.t4g.micro"
  allocated_storage = 20

  tags = {
    Environment = "preview"
    PRNumber    = local.pr_number
    Owner       = var.pr_author
    ExpiresAt   = timeadd(timestamp(), "${local.ttl_hours}h")
    ManagedBy   = "preview-env-pipeline"
  }
}

The reaper is then a scheduled job that lists resources tagged Environment=preview with an ExpiresAt in the past and destroys their stacks:

#!/usr/bin/env bash
# Runs on a schedule, independent of any PR webhook.
set -euo pipefail
now=$(date -u +%s)

for stack in $(list_preview_stacks); do
  expires=$(stack_tag "$stack" ExpiresAt)
  if [ "$(date -u -d "$expires" +%s)" -lt "$now" ]; then
    echo "Reaping expired preview stack: $stack"
    destroy_stack "$stack"   # idempotent: re-run cleans a half-destroyed stack
  fi
done

The reaper must be idempotent, because the stacks it cleans are often half-created or half-destroyed. Re-running should converge toward “gone,” never compound the mess.

Deterministic naming makes everything findable

Both the spin-up and the reaper depend on a naming scheme keyed to the PR number. preview-pr-1234 is collision-free across concurrent PRs, trivially greppable in the console, and gives the reaper an unambiguous target. Resist the temptation to key on branch name (they get reused and contain illegal characters) or a random ID (the reaper can’t correlate it back to a PR). The PR number is monotonic, unique, and present in every CI context.

Isolation: preview code is not reviewed code

Here’s the security reality teams underweight: a preview environment runs the IaC from a branch that, by definition, hasn’t been merged or fully reviewed. If the spin-up job uses production credentials, then opening a PR that modifies the infrastructure code is a path to running arbitrary changes with prod access. That’s not hypothetical — it’s the structural shape of the system unless you isolate it.

The rule is simple and non-negotiable: preview environments live in a separate account, project, or at minimum a separate namespace, with scoped credentials that cannot touch production state, secrets, or data. The spin-up job’s role should be able to create and destroy preview resources and nothing else. This also makes the reaper safe — a destroy job with prod access is a liability; a destroy job scoped to the preview account is routine.

Using AI to design the pipeline, then hardening it

This is a good task to draft with an LLM and harden by hand. A prompt:

You are a platform engineer. Design a per-PR ephemeral environment pipeline using OpenTofu. Provision on PR open, tear down on close, and add a scheduled reaper that destroys anything older than 48h regardless of PR state. Tag everything for cost attribution. Isolate credentials from production. Show the CI jobs and the reaper.

The model produced a workable pipeline and flagged the trap teams fall into:

I implemented teardown on PR close AND the scheduled reaper as you asked. One caution: I noticed your spin-up job reuses the shared CI role. For preview environments running unreviewed branch code, I’d strongly recommend a dedicated, scoped role in a separate account — otherwise a PR can modify the IaC and run it with whatever the CI role can reach.

That caution is the part a human signs off on. The model surfaced the isolation gap, but provisioning a separate account and scoping the role is an organizational decision, not something to let an automated pipeline assume. Use AI to write the boilerplate; keep the blast-radius decisions human.

The economics only work with teardown

Preview environments are a productivity multiplier when they’re cheap and disposable, and a recurring incident when they leak. The whole value proposition assumes they cost nothing when idle — which is only true if they actually go away. Pick the smallest viable resource sizes, set hard cost limits, and let the reaper be the thing you trust rather than the webhook.

For generating the full pipeline, see our ephemeral preview environments prompt, and pair it with the cost estimation CI gate prompt to catch expensive environments before they’re created. The Infrastructure as Code category covers the surrounding pipeline tooling. Build the reaper first, the spin-up second — the cleanup is the hard part, and it’s the part that protects the bill.