Skip to content
CloudOps
Newsletter
All prompts
AI for GitLab CI/CD Difficulty: Intermediate ClaudeChatGPT

GitLab Pipeline Audit & Slow Job Hunt Prompt

Audit GitLab pipelines for stale jobs, queueing delays, runner capacity issues, and find the slow jobs that dominate critical path.

Target user
DevOps engineers and platform leads investigating pipeline performance
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior DevOps engineer who has audited GitLab CI/CD at scale — finding pipelines stuck for hours, jobs queueing because runners are saturated, jobs that should have completed in 5 minutes taking 50.

I will provide:
- The scope: project-level audit / group-wide audit / a specific slow pipeline
- Recent timing data (pipeline durations, queue times)
- Runner inventory (count, executor type, capacity)
- The goal: find slow jobs / fix queueing / capacity plan

Your job:

1. **Identify the slowest jobs**:
   - Use GitLab API to pull jobs across recent pipelines
   - Calculate p50/p95/p99 duration per job name
   - Look for outliers AND systematic slowness
2. **Distinguish duration from queue time**:
   - `duration` — wall-clock time the job ran
   - `queued_duration` — time the job waited for a runner before starting
   - High queued_duration = capacity issue, not job slowness
3. **For capacity analysis**:
   - Concurrent jobs running vs `concurrent` setting on runners
   - Time of day patterns: peak hours overloaded
   - Per-runner utilization
4. **For job-level slowness**:
   - Is the job's actual work slow (e.g., compile takes 20 min) or is it waiting on something (cache restore, image pull)?
   - Cache restore + push at job start/end can dominate
   - Image pull on cold runner is significant on first job
   - Artifact upload/download for large artifacts
5. **For pipeline-level slowness**:
   - Critical path: longest chain of dependent jobs
   - Stage-based pipelines have implicit ordering; DAG (`needs:`) can shorten
   - Single bottleneck job (e.g., e2e test 30 min) dominates total
6. **For queueing**:
   - Runners pool fully consumed → jobs wait
   - Solution: add runners OR reduce per-job duration OR change job-runner tag matching
   - Group runners vs project runners: scope matters
7. **For stuck / stale jobs**:
   - Jobs that ran but never reported finish (lost runner, network issue)
   - Default `timeout` per job (1 hour) — past this, GitLab kills them
   - Manual jobs (`when: manual`) that nobody clicked
8. **For org-wide patterns**:
   - Which projects consume most runner time?
   - Which jobs are unnecessarily heavy?
   - Are caches effective?

Mark DESTRUCTIVE: cancelling running jobs without notice, reducing runner count without capacity plan, removing caches "to test" (often dramatically slower).

---

Scope: [project / group / specific pipeline]
Recent timing data: [DESCRIBE]
Runner inventory: [count, executor, capacity]
Symptom: [DESCRIBE — slow / queued / stale / random]
Goal: [audit / fix / plan]

Why this prompt works

Pipeline performance is a top complaint and the slowest jobs aren’t always the obvious ones. Queue time vs duration is the key first split; many “slow pipelines” are capacity-constrained, not job-constrained.

How to use it

  1. Pull timing data firstkubectl get pods equivalent for pipelines.
  2. Distinguish queue time from run time — different fixes.
  3. Find top 5 slowest and focus there.
  4. For org-wide audit, aggregate by project.

Useful commands

# Pipeline durations (last N pipelines for a project)
curl --header "PRIVATE-TOKEN: <t>" \
    "https://gitlab.example.com/api/v4/projects/<id>/pipelines?per_page=100" | \
    jq -r '.[] | "\(.id) \(.duration)s \(.queued_duration)s \(.status) \(.ref)"' | head

# Per-job stats for a pipeline
curl --header "PRIVATE-TOKEN: <t>" \
    "https://gitlab.example.com/api/v4/projects/<id>/pipelines/<pid>/jobs" | \
    jq -r '.[] | "\(.duration)s queue=\(.queued_duration)s \(.name) [\(.stage)]"' | sort -nr | head

# Average duration per job name across last N pipelines
PROJ_ID=42
for PID in $(curl -s --header "PRIVATE-TOKEN: $TOKEN" \
    "$GITLAB/api/v4/projects/$PROJ_ID/pipelines?per_page=50&status=success" | jq -r '.[].id'); do
    curl -s --header "PRIVATE-TOKEN: $TOKEN" \
        "$GITLAB/api/v4/projects/$PROJ_ID/pipelines/$PID/jobs" | \
        jq -r '.[] | "\(.name)\t\(.duration)"'
done | awk -F'\t' '{sum[$1]+=$2; count[$1]++} END {for (n in sum) print sum[n]/count[n], n}' | sort -n | tail

# Runner status (admin)
curl --header "PRIVATE-TOKEN: <t>" \
    "https://gitlab.example.com/api/v4/runners" | jq '.[] | {id, description, active, online, status, contacted_at}'

# Find old/stuck pipelines
curl --header "PRIVATE-TOKEN: <t>" \
    "https://gitlab.example.com/api/v4/projects/<id>/pipelines?status=running&updated_before=$(date -d '1 day ago' -Iseconds)" | jq

# Cancel a stuck pipeline (carefully)
curl --request POST --header "PRIVATE-TOKEN: <t>" \
    "https://gitlab.example.com/api/v4/projects/<id>/pipelines/<pid>/cancel"

# Per-project usage (admin)
curl --header "PRIVATE-TOKEN: <t>" \
    "https://gitlab.example.com/api/v4/groups/<id>/projects?include_subgroups=true&statistics=true" | \
    jq -r '.[] | "\(.statistics.shared_runners_minutes)\t\(.path_with_namespace)"' | sort -nr | head

Aggregation scripts

Find slowest jobs project-wide

#!/bin/bash
PROJ_ID=$1
echo "Pulling last 30 successful pipelines..."
JOBS_FILE=$(mktemp)
for PID in $(curl -s --header "PRIVATE-TOKEN: $TOKEN" \
    "$GITLAB/api/v4/projects/$PROJ_ID/pipelines?per_page=30&status=success" | jq -r '.[].id'); do
    curl -s --header "PRIVATE-TOKEN: $TOKEN" \
        "$GITLAB/api/v4/projects/$PROJ_ID/pipelines/$PID/jobs" | \
        jq -r '.[] | "\(.name)|\(.duration)|\(.queued_duration)"' >> "$JOBS_FILE"
done

echo "=== Top 10 slowest by p50 ==="
awk -F'|' '{durations[$1]=durations[$1]" "$2} END {for (n in durations) print n, durations[n]}' "$JOBS_FILE" | \
    while read name nums; do
        sorted=$(echo "$nums" | tr ' ' '\n' | sort -n)
        p50=$(echo "$sorted" | awk 'BEGIN{c=0} {a[c++]=$0} END{print a[int(c/2)]}')
        echo "$p50 $name"
    done | sort -n | tail

rm "$JOBS_FILE"

Detect queueing patterns

# Across all recent pipelines, find jobs with high queue time
for PID in $(curl -s --header "PRIVATE-TOKEN: $TOKEN" \
    "$GITLAB/api/v4/projects/$PROJ_ID/pipelines?per_page=20" | jq -r '.[].id'); do
    curl -s --header "PRIVATE-TOKEN: $TOKEN" \
        "$GITLAB/api/v4/projects/$PROJ_ID/pipelines/$PID/jobs" | \
        jq -r '.[] | select(.queued_duration > 60) | "\(.queued_duration)s queue \(.duration)s run \(.name)"'
done | sort -nr | head

Common findings this catches

  • Single slow integration test dominating pipeline duration → split, parallelize, or move to async post-merge.
  • All jobs queue at 9 AM → runner capacity inadequate for peak; add or autoscale.
  • One project consumes 70% of runner-minutes → audit; possibly broken cache invalidation or excessive testing.
  • Manual jobs sit pending for days → either remove the manual gate or assign owners.
  • Image pull dominates startup of every job → pre-pull on runners; use dependency proxy.
  • Cache restore takes 3 min for a 5-min job → cache too large; trim paths.
  • Pipelines hit 1-hour timeout often → break into smaller pipelines or raise timeout.

Capacity planning template

Concurrent jobs (peak observed) = N
Concurrent jobs (typical) = M
Average job duration = D minutes
Pipelines per hour (peak) = P
Required concurrent runner capacity = N × (D / 60) × buffer

When to escalate

  • Org-wide queue issues — capacity planning meeting; budget for more runners.
  • Specific project anti-pattern — engage owners; share findings.
  • GitLab.com shared runner saturation — consider buying minutes or moving to self-hosted runners.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week