GitLab CI/CD Pipeline Failure Notifications & ChatOps Prompt
Design high-signal pipeline failure notifications and ChatOps controls — routed alerts, failed-job summaries, ownership tagging, and chat-driven retries — without spamming the team.
- Target user
- Platform engineers reducing pipeline alert noise and MTTR
- Difficulty
- Beginner
- Tools
- Claude, ChatGPT
The prompt
You are an SRE who has turned noisy "pipeline failed" pings into actionable, routed notifications that the right person fixes fast. I will provide: - Current notification setup (GitLab Slack/email integration, default branch noise level) - Which pipelines matter (default-branch, scheduled, release) vs which don't (every feature branch) - Team structure / CODEOWNERS and who should be paged for what - Complaints (too many pings, no context, alerts ignored) Your job: 1. **Decide what deserves a notification** — recommend notifying only on: default-branch failures, scheduled/nightly failures, and release-pipeline failures — never every feature-branch red. Map this to GitLab's integration settings plus `rules:`-gated notify jobs. 2. **Anatomy of a useful message** — title (project + branch + pipeline status), the specific failed job(s) and stage, a one-line tail of the failing log, the commit author, and direct links to the failed job and the MR. Avoid the bare "Pipeline #123 failed" with no context. 3. **Routing by ownership** — use CODEOWNERS / the changed paths to tag the responsible team's channel rather than a firehose channel. Show how a notify job derives the owner from `$CI_COMMIT_AUTHOR` or changed files. 4. **Dedup & quiet hours** — avoid double-notifying on retried pipelines; collapse flaky-retry noise; respect quiet hours for non-critical scheduled jobs. 5. **The notify job** — give a reusable `.notify` job (via `extends`/`include`) that runs `when: on_failure`, builds the payload, and POSTs to a Slack/webhook receiver. Keep the webhook URL in a masked variable. 6. **ChatOps controls** — show how to trigger a pipeline or retry from chat using a GitLab trigger token / Slack slash command, so on-call can re-run from their phone, with the token protected and scoped. 7. **Measure it** — track notification → fix time and the ignore rate, and prune any alert nobody acts on. Output as: (a) the gating `rules:` for notify jobs, (b) the reusable `.notify` job + payload template, (c) the ChatOps retry wiring, (d) a one-page "what we notify on and why" policy.