GitLab Runner Autoscaling with Fleeting Plugin Prompt
Design and operate the modern GitLab Runner autoscaler — fleeting plugin (AWS/GCP/Azure/Kubernetes), capacity tuning, spot/preemptible usage, idle scale-down.
- Target user
- Platform engineers operating GitLab Runner autoscalers
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior platform engineer who has migrated GitLab Runner autoscaling from the deprecated docker-machine to the modern fleeting plugin. You know AWS/GCP/Azure autoscaling group integration, capacity policies, and spot usage at scale. I will provide: - The cloud provider (AWS, GCP, Azure) - Current setup (legacy docker-machine, fleeting, or new install) - Workload patterns (peak/off-peak, job duration distribution, concurrency) - The goal: install / migrate / tune Your job: 1. **Understand the architecture**: - **Fleeting** is a plugin system; GitLab Runner uses it to spin up VM-based workers on demand - Plugins: `aws`, `gcp`, `azure`, `static`, `googlecloud` - Replaces docker-machine (deprecated since 2024) - Each "instance" is a full VM that runs jobs; can have its own Docker daemon, executor config 2. **Key concepts**: - **Capacity per worker** — concurrent jobs per VM (`use_static_credentials` controls) - **`max` per ASG/MIG/VMSS** — upper bound - **`max_use_count`** — how many jobs before a VM is replaced (mitigates state leakage) - **`idle.count` / `idle.time`** — minimum idle VMs and idle scale-down delay - **`min` / `max`** instances 3. **Migration from docker-machine**: - Install fleeting binary on runner host - Update `config.toml` from `[runners.machine]` to `[runners.autoscaler]` - Re-create ASG/MIG/VMSS with fleeting-friendly config - Cutover: drain old, switch traffic 4. **For AWS**: - `[runners.autoscaler.plugin]` = `aws` - Requires IAM permissions for EC2 and ASG operations - Launch template defines AMI, instance type, user-data - Spot supported via mixed instances policy 5. **For Kubernetes executor with autoscaler**: - Different: Kubernetes executor uses pods (no VM autoscaling needed at runner level) - K8s itself handles autoscaling (cluster autoscaler / Karpenter scales nodes for new pods) - Fleeting isn't needed in this case 6. **For capacity tuning**: - Peak concurrent jobs observed (P) → set `max = P × 1.2` - Average job duration → cost of leaving idle vs scale-up latency - For bursty workloads: keep `idle.count > 0` to absorb peaks - Spot reclaim: ensure jobs are tolerant (rerun on failure) 7. **For cost optimization**: - Spot/preemptible for non-critical CI - Larger instance types with higher capacity per worker - Aggressive scale-down for off-peak - Pre-pull common Docker images via user-data to speed first-job 8. **For monitoring**: - Fleeting plugin emits Prometheus metrics - Per-instance lifecycle events - Capacity utilization dashboard Mark DESTRUCTIVE: scaling down idle VMs aggressively (jobs queue waiting for cold-start), spot instances for long-running critical jobs (reclaim mid-job), modifying launch template while jobs running (next-replaced VM uses new config). --- Cloud provider: [AWS / GCP / Azure / K8s] Current setup: [docker-machine legacy / fleeting / new] Workload pattern: [DESCRIBE — peak/off-peak, concurrency] Goal: [install / migrate / tune cost / tune performance]
Why this prompt works
Runner autoscaling has changed significantly with fleeting. Many teams still run docker-machine and don’t realize it’s deprecated. This prompt walks the modern setup.
How to use it
- Install fleeting on a runner host (not the workers).
- Configure plugin per cloud.
- Tune capacity based on observed peaks.
- Monitor via Prometheus.
Useful commands
# Install fleeting plugin
sudo apt install gitlab-runner
# Install AWS plugin
sudo gitlab-runner exec fleeting-plugin-aws --version # or similar; check docs
# Verify
gitlab-runner --version
gitlab-runner verify
# Plugin binaries (depending on OS package)
ls /usr/local/bin/fleeting-plugin-*
# Check pre-deprecation usage
grep -r "MachineDriver" /etc/gitlab-runner/config.toml
Config patterns
AWS fleeting
[[runners]]
name = "aws-autoscaler"
url = "https://gitlab.example.com"
token = "TOKEN"
executor = "docker-autoscaler"
[runners.docker]
image = "alpine:latest"
[runners.autoscaler]
plugin = "aws"
capacity_per_instance = 4
max_use_count = 10
max_instances = 50
[runners.autoscaler.plugin_config]
name = "gitlab-runner-pool" # ASG name
region = "us-east-1"
[[runners.autoscaler.policy]]
idle_count = 2
idle_time = "20m"
periods = ["* * 9-17 * * mon-fri *"] # peak hours
[[runners.autoscaler.policy]]
idle_count = 0
idle_time = "5m" # off-peak: no idle
GCP fleeting
[[runners]]
[runners.autoscaler]
plugin = "googlecloud"
capacity_per_instance = 4
max_use_count = 10
max_instances = 50
[runners.autoscaler.plugin_config]
project = "my-project"
instance_group = "gitlab-runner-mig"
zone = "us-central1-a"
Static (no autoscaling, for fixed worker pool)
[[runners]]
[runners.autoscaler]
plugin = "static"
capacity_per_instance = 1
[runners.autoscaler.plugin_config]
instances = ["host1.example.com:22", "host2.example.com:22"]
ASG / Launch Template (AWS)
# Terraform / CloudFormation for the launch template
LaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateName: gitlab-runner-workers
LaunchTemplateData:
ImageId: ami-XXX # Ubuntu 22.04 with Docker pre-installed
InstanceType: m6a.large
IamInstanceProfile:
Name: gitlab-runner-worker
UserData: !Base64 |
#!/bin/bash
# Pre-pull common images for faster first-job
docker pull node:20-alpine
docker pull python:3.12-slim
# Configure runner OS
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: gitlab-runner-pool
MinSize: 0
MaxSize: 50
DesiredCapacity: 2
HealthCheckType: EC2
LaunchTemplate:
LaunchTemplateName: gitlab-runner-workers
Version: $Latest
MixedInstancesPolicy:
LaunchTemplate:
LaunchTemplateSpecification:
LaunchTemplateName: gitlab-runner-workers
Version: $Latest
Overrides:
- InstanceType: m6a.large
- InstanceType: m5a.large
- InstanceType: m6i.large
InstancesDistribution:
OnDemandBaseCapacity: 1 # always keep 1 on-demand
OnDemandPercentageAboveBaseCapacity: 25
SpotAllocationStrategy: capacity-optimized
Migration from docker-machine
# OLD: docker-machine
[runners.machine]
IdleCount = 2
MachineDriver = "amazonec2"
MachineName = "gitlab-runner-%s"
MachineOptions = ["amazonec2-region=us-east-1", ...]
# NEW: fleeting (replace with [runners.autoscaler])
[runners.autoscaler]
plugin = "aws"
capacity_per_instance = 4
max_use_count = 10
max_instances = 50
[runners.autoscaler.plugin_config]
name = "gitlab-runner-pool"
region = "us-east-1"
[[runners.autoscaler.policy]]
idle_count = 2
Common findings this catches
- Jobs queue every Monday morning →
idle.counttoo low for peak; add scheduled policy. - Spot reclaim killing jobs mid-run → use on-demand for critical, spot for retryable.
- Cold-start every job →
max_use_count: 1is over-conservative; raise to 5-10. - Idle VMs costing money off-hours → time-window policy: zero idle outside business hours.
- First-job-of-day slow due to image pull → pre-pull common images in user-data.
- ASG max reached but jobs still queue → raise max; capacity plan.
- Worker host failing health checks → user-data error; check CloudWatch / instance logs.
When to escalate
- Cloud capacity unavailable (instance types out of stock in AZ) — coordinate with cloud team.
- Spot pricing spike — fallback to on-demand or scale down.
- Persistent autoscaler errors — engage fleeting plugin maintainers (open issue with logs).
Related prompts
-
GitLab Pipeline Audit & Slow Job Hunt Prompt
Audit GitLab pipelines for stale jobs, queueing delays, runner capacity issues, and find the slow jobs that dominate critical path.
-
GitLab Runner Troubleshooting Prompt
Diagnose GitLab Runner failures — runner offline, executor errors, Docker-in-Docker issues, autoscaler problems, slow job pickup, and resource exhaustion.
-
Kubernetes Cluster Autoscaler / Karpenter Debug Prompt
Diagnose cluster autoscaling — scale-up delay, scale-down protection, node group selection, pod doesn't fit any template, Karpenter NodePool/NodeClaim issues.