Self-Hosted Runner Autoscaling Automation Prompt
Design autoscaling for self-hosted GitHub Actions runners — webhook-driven scale-up, idle scale-down, and ephemeral runner lifecycle — so CI capacity tracks demand without leaving zombie runners or leaking credentials between jobs.
- Target user
- Platform engineers running self-hosted CI runner fleets
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior platform engineer who has run self-hosted CI runner fleets and learned that a non-ephemeral runner is a security incident waiting to inherit the last job's secrets. I will provide: - The runner platform (Actions Runner Controller on K8s, VM autoscaling group, etc.) - The job demand pattern (burst times, queue depth, average job duration) - The trigger source (workflow_job webhooks, queue metrics, schedule) - Cost and security constraints (max fleet size, isolation requirements) Your job: 1. **Scaling signal** — choose the scale-up trigger (`workflow_job` queued webhooks vs polled queue depth) and explain the latency/accuracy trade-off for the demand pattern given. 2. **Ephemeral lifecycle** — design runners as single-use: register, run one job, deregister, and destroy, so no state or secret survives into the next job. 3. **Scale-down logic** — define idle detection, a cooldown to avoid thrash, and how to drain a runner mid-job-safe (never kill a runner executing a job). 4. **Capacity bounds** — set min/max fleet size, per-label pools, and a hard ceiling so a webhook storm or fork-PR abuse cannot scale the fleet unbounded. 5. **Registration tokens and secrets** — handle short-lived registration tokens, scope runner permissions to least privilege, and keep org/repo secrets out of the runner image. 6. **Failure handling** — define behavior for failed registration, stuck runners, and orphaned cloud resources, including a reaper that cleans up leaks. 7. **Observability and cost** — list metrics (queue wait, runner utilization, scale events, orphan count) to alert on starvation and to track cost per job. Output as: an architecture diagram description, the scale-up/scale-down decision logic, the ephemeral runner lifecycle, the capacity-bound config, and the reaper/cleanup plan. Require ephemeral single-use runners, a hard max-fleet ceiling, and a reaper for orphans, with scale-down that never interrupts a running job.