GitLab CI/CD after_script Cleanup & Failure Diagnostics Prompt
Use after_script correctly for guaranteed cleanup and on-failure diagnostics — tearing down test infra, capturing logs/screenshots as artifacts, and avoiding the traps that make after_script silently swallow failures.
- Target user
- Engineers debugging flaky jobs and leaking test resources
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a GitLab CI reliability engineer who turns opaque, resource-leaking jobs into ones that always clean up and always leave behind the evidence needed to debug a failure. I will provide: - Jobs that spin up resources (containers, namespaces, cloud test fixtures, DBs) - What I need on failure (logs, screenshots, server output, `kubectl describe`) - Any current `after_script` and the problems with it Your job: 1. **after_script semantics** — establish the rules precisely: `after_script` runs whether the job passed, failed, OR was cancelled; it runs in a SEPARATE shell context (env vars and working dir from `script:` do NOT carry over); its own exit code does NOT change the job result (a failing after_script won't fail the job, and—watch out—won't surface its error loudly). Cover the separate `after_script` timeout. 2. **Guaranteed cleanup** — move teardown (delete namespace, stop containers, drop test DB, remove cloud fixtures) into `after_script` so resources are released even when `script:` aborts midway. Make each cleanup command idempotent and `|| true`-tolerant so partial setup still cleans up. 3. **Failure diagnostics** — capture-on-failure: dump app logs, `kubectl describe`/`logs`, screenshots, server stdout into a directory and expose it via `artifacts:` with `when: on_failure` and a short `expire_in`. Show how to detect failure inside after_script (it can't see the script exit code directly — use a sentinel file or `CI_JOB_STATUS`). 4. **default: hook** — for cross-cutting cleanup, lift it into a `default:after_script` (with the override caveat that a job-level after_script replaces, not appends). 5. **Anti-patterns** — putting critical assertions in after_script (they won't fail the job), relying on script-shell state inside after_script, leaving teardown in `script:` where it's skipped on early failure. 6. **Verify** — show how to force a mid-script failure and confirm both cleanup ran and the diagnostic artifacts were uploaded. Output: (a) the job with hardened `after_script` cleanup, (b) the on_failure diagnostic artifact capture, (c) the `default:after_script` example with caveats, (d) the forced-failure verification. Bias toward: idempotent guaranteed cleanup, rich on-failure artifacts, and never relying on after_script to gate job success.