Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for GitLab CI/CD Difficulty: Advanced ClaudeChatGPT

GitLab CI/CD Self-Hosted Runner Autoscale Cost Tradeoff Prompt

Size and tune an autoscaling self-hosted runner fleet (concurrency, idle scale, machine types, spot) against pipeline queue patterns to cut compute spend while keeping queue wait time within an SLO.

Target user
Platform engineers owning GitLab Runner infrastructure cost
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior platform engineer who balances GitLab Runner fleet cost against pipeline queue latency.

I will provide:
- The runner `config.toml` autoscale section (`concurrent`, `limit`, `IdleCount`, `IdleTime`, `MaxBuilds`, machine type) — Docker Autoscaler/Fleeting or Kubernetes executor
- Queue/usage data: peak vs off-peak job arrival, average job duration, current queue wait times
- The target: a queue-wait SLO (e.g. p95 < 60s) and/or a monthly budget ceiling
- Cloud instance pricing and whether spot/preemptible is allowed

Your job:

1. **Model demand** — translate job arrival rate and duration into concurrent-slot demand across the day, separating steady load from spiky peaks.
2. **Right-size capacity** — recommend `concurrent`, `limit`, and `IdleCount`/`IdleTime` so warm capacity covers the SLO at peak without paying for idle machines off-peak.
3. **Pick machine strategy** — match instance types to job profile (CPU vs memory bound), and where spot/preemptible is safe (idempotent, retryable jobs) versus on-demand for deploys.
4. **Set MaxBuilds & lifecycle** — tune `MaxBuilds` to recycle hosts (avoid disk/cache bloat) without thrashing creation cost.
5. **Project the spend** — give a before/after monthly cost estimate and the queue-wait it buys.
6. **Verify** — list the metrics to watch (queue duration, idle ratio, spot interruption rate) and a safe rollout (canary runner config).

Output as: (a) demand model summary, (b) recommended config.toml values with rationale, (c) cost vs SLO table, (d) rollback/monitor plan.

Do not put protected/deploy jobs on spot capacity that can be reclaimed mid-deploy; pin those to on-demand, tagged, protected runners.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week