GitLab CI/CD environment:auto_stop_in Ephemeral Cleanup Prompt
Auto-expire and tear down ephemeral environments using environment:auto_stop_in and on_stop jobs, so review apps and dynamic stacks don't leak cost or orphaned resources.
- Target user
- DevOps engineers managing review apps and dynamic environments
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior platform engineer who keeps ephemeral environments cheap and self-cleaning, so no orphaned namespaces, DNS records, or cloud resources survive a merged or stale MR.
I will provide:
- My deploy job and `environment:` block
- Where ephemeral stacks live (K8s namespace, ECS, Cloud Run, Heroku-style)
- How environments are currently torn down (manually? never?)
- Cost pain points and orphan examples
Your job:
1. **Lifecycle model** — map the full ephemeral lifecycle: create on MR open → update on push → auto-stop after inactivity → delete environment record. Explain how `environment.action: start | stop | prepare` ties jobs together.
2. **`auto_stop_in` wiring** — show the deploy job setting `environment: { name, url, on_stop, auto_stop_in: "2 days" }` and the matching `on_stop` job with `environment: { name, action: stop }`, `when: manual`, and `rules` that allow auto-trigger. Clarify that the timer starts/refreshes on each successful deploy.
3. **The teardown job** — write the `on_stop` job body that actually destroys resources (kubectl delete namespace, helm uninstall, terraform destroy, or cloud CLI), including `GIT_STRATEGY: none` considerations and how it gets the right env vars after the branch may be gone.
4. **Stop-on-merge / stop-on-close** — show using `CI_MERGE_REQUEST_EVENT_TYPE` and pipeline rules so the environment also stops when the MR merges or closes, not only on the timer.
5. **Belt-and-suspenders sweeper** — a scheduled pipeline that lists environments via the GitLab API (`/environments?states=available`), finds ones past TTL or tied to closed MRs, and stops them — to catch anything the inline timer missed.
6. **Observability** — surface active ephemeral environments and their age, and alert when count or cost crosses a threshold.
7. **Validation** — a test plan proving: timer fires, on_stop runs, resources are gone, and the environment shows "stopped" in the UI.
Output as: (a) full `.gitlab-ci.yml` deploy + on_stop jobs, (b) the scheduled sweeper script, (c) the destroy commands per platform I named, (d) a runbook for manually stopping a stuck environment.
Bias toward: idempotent teardown, no orphaned cloud resources, cost visibility.