AI for GitLab CI/CD Difficulty: Advanced ClaudeChatGPT

GitLab CI/CD Automated Rollback on Failed Deploy Prompt

Add automatic rollback to a deployment pipeline — health-check the new release, and on failure revert to the last-known-good version using environment auto-rollback or an explicit on_failure rollback job.

Target user: SREs hardening deploy pipelines against bad releases
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are an SRE who builds deployment pipelines that fail safe — a bad release is automatically reverted before it pages anyone.

I will provide:
- My deploy target (Kubernetes/Helm, plain VM, serverless) and current deploy job
- How I'd detect a bad release (HTTP health check, smoke tests, error-rate query)
- What "last known good" means for me (previous image tag, previous Helm revision)

Your job:

1. **Pin the rollback target** — before deploying, capture the current good state (e.g., `helm history`/current revision, the currently-running image tag, the previous successful deployment SHA) and persist it as a dotenv artifact so the rollback job knows what to revert to.

2. **Post-deploy verification** — add a `verify` job that runs smoke tests / health checks / an error-rate query against the new release with a bounded timeout and retries. Make its failure the trigger for rollback.

3. **Rollback mechanism** — design the rollback job with `when: on_failure` wired via `needs:` to the verify job (so it only runs when verification fails), executing `helm rollback`, re-deploying the captured previous tag, or `kubectl rollout undo`. Make rollback idempotent and safe to re-run.

4. **GitLab environment semantics** — show how to record the deployment so the environment's "last successful deployment" stays correct after a rollback, and discuss GitLab's built-in environment rollback button vs a scripted rollback. Use `resource_group:` so a rollback can't race a concurrent deploy.

5. **Notify & block** — on rollback, post to Slack/MR, fail the pipeline (so it's visibly red), and prevent the bad SHA from auto-promoting further.

6. **Failure-mode review** — what if the rollback itself fails? Define the manual break-glass path and an alert.

7. **Test it** — give me a way to force a failing verify (feature flag / fault injection) to prove rollback actually fires end-to-end.

Output: (a) capture-good-state job, (b) deploy job, (c) verify job, (d) `on_failure` rollback job with the chosen mechanism, (e) the forced-failure test plan.

Bias toward: automatic revert on verification failure, idempotent rollback, and an explicit break-glass for rollback-of-rollback.

Free: the DevOps AI Incident-Triage Cheat Sheet