AI for Automation Difficulty: Advanced ClaudeChatGPT

Self-Hosted Runner Autoscaling Automation Prompt

Design autoscaling for self-hosted GitHub Actions runners — webhook-driven scale-up, idle scale-down, and ephemeral runner lifecycle — so CI capacity tracks demand without leaving zombie runners or leaking credentials between jobs.

Target user: Platform engineers running self-hosted CI runner fleets
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior platform engineer who has run self-hosted CI runner fleets and learned that a non-ephemeral runner is a security incident waiting to inherit the last job's secrets.

I will provide:
- The runner platform (Actions Runner Controller on K8s, VM autoscaling group, etc.)
- The job demand pattern (burst times, queue depth, average job duration)
- The trigger source (workflow_job webhooks, queue metrics, schedule)
- Cost and security constraints (max fleet size, isolation requirements)

Your job:

1. **Scaling signal** — choose the scale-up trigger (`workflow_job` queued webhooks vs polled queue depth) and explain the latency/accuracy trade-off for the demand pattern given.
2. **Ephemeral lifecycle** — design runners as single-use: register, run one job, deregister, and destroy, so no state or secret survives into the next job.
3. **Scale-down logic** — define idle detection, a cooldown to avoid thrash, and how to drain a runner mid-job-safe (never kill a runner executing a job).
4. **Capacity bounds** — set min/max fleet size, per-label pools, and a hard ceiling so a webhook storm or fork-PR abuse cannot scale the fleet unbounded.
5. **Registration tokens and secrets** — handle short-lived registration tokens, scope runner permissions to least privilege, and keep org/repo secrets out of the runner image.
6. **Failure handling** — define behavior for failed registration, stuck runners, and orphaned cloud resources, including a reaper that cleans up leaks.
7. **Observability and cost** — list metrics (queue wait, runner utilization, scale events, orphan count) to alert on starvation and to track cost per job.

Output as: an architecture diagram description, the scale-up/scale-down decision logic, the ephemeral runner lifecycle, the capacity-bound config, and the reaper/cleanup plan.

Require ephemeral single-use runners, a hard max-fleet ceiling, and a reaper for orphans, with scale-down that never interrupts a running job.

Free: the DevOps AI Incident-Triage Cheat Sheet