AI-Generated Rollback Jobs for GitLab CI Deployments

It was 4:52 PM on a Friday. The deploy went green. I closed my laptop. Eleven minutes later my phone lit up: error rates climbing, checkout failing, the on-call channel filling with the specific kind of calm that means everyone is quietly panicking.

Here is the part that still stings. We had a beautiful deploy job. Environments, approvals, Slack notifications, the works. What we did not have was a rollback job. So I did what every engineer does in that moment — I opened a terminal and started typing kubectl commands from memory, hoping I remembered the previous image tag, hoping I had the right context selected, hoping I did not fat-finger a namespace at 5 PM on a Friday.

We recovered. But the lesson was permanent: a deploy pipeline without a rollback path is half a pipeline. And the fastest way I have found to close that gap is to let an AI assistant draft the rollback scaffolding directly from my existing deploy job — then review it like I would review a junior engineer’s first PR.

The Mental Model: AI Is a Fast Junior Engineer

Before any YAML, set expectations. An AI model is a fast, tireless junior engineer who has read every .gitlab-ci.yml on the internet. It is excellent at turning “here is my deploy job, write me the inverse” into a working first draft in seconds.

It is also a junior engineer who has never seen your cluster, does not know which environments are protected, and will confidently invent a helm flag that does not exist. So two rules, non-negotiable:

Review every line before merge. A rollback job that is subtly wrong is worse than no rollback job, because you will run it under pressure and trust it.
Never give it real cluster credentials or CI secrets. Paste your deploy job structure, not your kubeconfig, not your $KUBE_TOKEN, not your Stripe key. The AI does not need secrets to write the scaffolding — it needs the shape of your jobs.

With that framing, here is how I actually do it. I keep a reusable prompt for this in my prompt workspace so I am not re-explaining context every time.

Start From Your Deploy Job

You cannot ask for a good rollback without showing the deploy. Here is a representative Kubernetes deploy job:

deploy_production:
  stage: deploy
  image: bitnami/kubectl:1.30
  environment:
    name: production
    url: https://app.example.com
    on_stop: stop_production
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: manual
  script:
    - kubectl set image deployment/web web=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA -n production
    - kubectl rollout status deployment/web -n production --timeout=180s

I paste exactly this into Claude or ChatGPT with a prompt like: “Here is my GitLab deploy job. Draft a manual rollback job that reverts to the previously deployed image. Use GitLab environments. Do not assume any secrets.” What comes back is a starting point — not a finished artifact.

Capture the Previous Version as an Artifact

The single most common bug in AI-drafted rollbacks: the model assumes you already know the previous image tag. You usually do not. So the first thing I have it add is a step that records the currently-running version before the deploy overwrites it.

deploy_production:
  stage: deploy
  image: bitnami/kubectl:1.30
  environment:
    name: production
    url: https://app.example.com
    on_stop: stop_production
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: manual
  script:
    - |
      PREV_IMAGE=$(kubectl get deployment/web -n production \
        -o jsonpath='{.spec.template.spec.containers[0].image}')
      echo "PREV_IMAGE=$PREV_IMAGE" > deploy.env
    - kubectl set image deployment/web web=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA -n production
    - kubectl rollout status deployment/web -n production --timeout=180s
  artifacts:
    reports:
      dotenv: deploy.env

That dotenv artifact is the trick. GitLab automatically loads PREV_IMAGE as a variable into jobs in later stages of the same pipeline. Now the rollback job has somewhere to read the previous tag from instead of guessing.

Pro Tip: AI loves to write rollbacks that “redeploy the previous commit.” Resist it. Rolling back by re-running CI rebuilds an image and re-runs migrations — slow and risky. You want to flip the running workload back to an artifact you already know is good. Capture the version, do not recompute it.

The Manual, Gated Rollback Job

Now the inverse job. The two attributes that matter most here are when: manual (a human pushes the button — never automatic) and environment:action: stop or a dedicated rollback environment so it shows up in the GitLab Environments UI.

rollback_production:
  stage: rollback
  image: bitnami/kubectl:1.30
  needs:
    - job: deploy_production
      artifacts: true
  environment:
    name: production
    action: prepare
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: manual
  allow_failure: false
  script:
    - test -n "$PREV_IMAGE" || { echo "No PREV_IMAGE recorded"; exit 1; }
    - echo "Rolling production back to $PREV_IMAGE"
    - kubectl set image deployment/web web=$PREV_IMAGE -n production
    - kubectl rollout status deployment/web -n production --timeout=180s

Note the guard: if PREV_IMAGE is empty, the job fails loudly instead of running kubectl set image ... web= and blanking your deployment. The AI’s first draft omitted that check. This is exactly the kind of thing review catches and a tired human at 5 PM does not.

Let kubectl Do the Rollback for You

There is an even simpler path the AI will often suggest, and for plain Kubernetes Deployments it is the most robust: kubectl rollout undo. The Deployment’s own revision history is your rollback artifact — no dotenv plumbing required.

rollback_production_native:
  stage: rollback
  image: bitnami/kubectl:1.30
  environment:
    name: production
    action: prepare
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: manual
  script:
    - kubectl rollout undo deployment/web -n production
    - kubectl rollout status deployment/web -n production --timeout=180s

This reverts to the immediately previous ReplicaSet. The caveat — and the AI will not warn you about it unless you ask — is that rollout undo only fixes the workload spec. It does not undo a database migration, a feature flag, or a config map. Treat it as a workload rollback, not a system rollback.

Helm Deployments: Use `helm rollback`

If you ship with Helm, you get the cleanest rollback story of all, because Helm tracks every release revision. Ask the AI to convert the kubectl rollback into a Helm one and it will hand you something close to this:

rollback_production_helm:
  stage: rollback
  image: alpine/helm:3.15.2
  environment:
    name: production
    action: prepare
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: manual
  script:
    - helm history web -n production --max 5
    - helm rollback web -n production --wait --timeout 3m

helm rollback web with no revision number rolls back to the previous successful release; pass an explicit revision (e.g. helm rollback web 7) to target a known-good one from the helm history output. The --wait flag blocks until pods are ready, which is exactly what you want before declaring victory.

Pro Tip: Have the AI add helm history as the first line of every Helm rollback job. The log output becomes a permanent record in the GitLab job of what the state was at rollback time — invaluable during the postmortem you will inevitably write.

The `on_stop` Hook and GitLab’s Built-in Environment Rollback

GitLab itself has a rollback button. When a deploy job declares environment:name, the Deployments page shows every past deploy with a “Rollback” action that re-runs the deploy job for that older commit. That is the GitLab-native path, and it is genuinely useful for stateless apps.

The companion piece is on_stop, which wires a teardown job to the environment so GitLab can stop it cleanly:

stop_production:
  stage: rollback
  image: bitnami/kubectl:1.30
  environment:
    name: production
    action: stop
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: manual
  script:
    - kubectl scale deployment/web --replicas=0 -n production

I keep both: the GitLab-native rollback for routine “revert to that commit” cases, and the dedicated helm rollback / set image jobs for fast, surgical reversion that does not rebuild anything.

Gate It With Protected Environments

Here is the requirement no AI will add unless you tell it to, because it lives in GitLab settings, not in YAML: who is allowed to press the rollback button. Under Settings → CI/CD → Protected environments, restrict production so only senior engineers or a specific group can deploy and roll back. The when: manual job will still appear, but GitLab blocks the click for anyone without the role.

That combination — when: manual in the YAML plus a protected environment in settings — is what turns “anyone can nuke prod” into “a named, authorized human deliberately rolled back.” The AI gives you the first half. You must configure the second.

When the draft comes back, I run it through a quick second-pass review — sometimes a code review pass — looking specifically for invented flags, missing guards, and any place the model assumed a secret it should not have.

Conclusion

That Friday outage cost us eleven minutes of downtime and a weekend of nerves, all because we had a deploy job and no inverse. The fix was not heroic. It was a rollback job — and AI drafted the bulk of it from my existing deploy config in the time it took to get coffee.

Lean on that. An AI assistant is a brilliant fast junior for turning a deploy job into a rollback scaffold, generating the Helm and kubectl variants, and reminding you about artifacts you would have forgotten. But it is still a junior: review every line, add the guards it skips, configure the protected environments it cannot see, and never, ever hand it a real credential. A wrong rollback at 5 PM is worse than no rollback at all — so make sure the one you merge is one you understood first.

Want reusable starting points? The GitLab CI/CD guides and the curated prompt packs have ready-made prompts for exactly this kind of scaffolding.