Skip to content
CloudOps
Newsletter
All prompts
AI for GitLab CI/CD Difficulty: Advanced ClaudeChatGPT

GitLab Runner Autoscaling with Fleeting Plugin Prompt

Design and operate the modern GitLab Runner autoscaler — fleeting plugin (AWS/GCP/Azure/Kubernetes), capacity tuning, spot/preemptible usage, idle scale-down.

Target user
Platform engineers operating GitLab Runner autoscalers
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior platform engineer who has migrated GitLab Runner autoscaling from the deprecated docker-machine to the modern fleeting plugin. You know AWS/GCP/Azure autoscaling group integration, capacity policies, and spot usage at scale.

I will provide:
- The cloud provider (AWS, GCP, Azure)
- Current setup (legacy docker-machine, fleeting, or new install)
- Workload patterns (peak/off-peak, job duration distribution, concurrency)
- The goal: install / migrate / tune

Your job:

1. **Understand the architecture**:
   - **Fleeting** is a plugin system; GitLab Runner uses it to spin up VM-based workers on demand
   - Plugins: `aws`, `gcp`, `azure`, `static`, `googlecloud`
   - Replaces docker-machine (deprecated since 2024)
   - Each "instance" is a full VM that runs jobs; can have its own Docker daemon, executor config
2. **Key concepts**:
   - **Capacity per worker** — concurrent jobs per VM (`use_static_credentials` controls)
   - **`max` per ASG/MIG/VMSS** — upper bound
   - **`max_use_count`** — how many jobs before a VM is replaced (mitigates state leakage)
   - **`idle.count` / `idle.time`** — minimum idle VMs and idle scale-down delay
   - **`min` / `max`** instances
3. **Migration from docker-machine**:
   - Install fleeting binary on runner host
   - Update `config.toml` from `[runners.machine]` to `[runners.autoscaler]`
   - Re-create ASG/MIG/VMSS with fleeting-friendly config
   - Cutover: drain old, switch traffic
4. **For AWS**:
   - `[runners.autoscaler.plugin]` = `aws`
   - Requires IAM permissions for EC2 and ASG operations
   - Launch template defines AMI, instance type, user-data
   - Spot supported via mixed instances policy
5. **For Kubernetes executor with autoscaler**:
   - Different: Kubernetes executor uses pods (no VM autoscaling needed at runner level)
   - K8s itself handles autoscaling (cluster autoscaler / Karpenter scales nodes for new pods)
   - Fleeting isn't needed in this case
6. **For capacity tuning**:
   - Peak concurrent jobs observed (P) → set `max = P × 1.2`
   - Average job duration → cost of leaving idle vs scale-up latency
   - For bursty workloads: keep `idle.count > 0` to absorb peaks
   - Spot reclaim: ensure jobs are tolerant (rerun on failure)
7. **For cost optimization**:
   - Spot/preemptible for non-critical CI
   - Larger instance types with higher capacity per worker
   - Aggressive scale-down for off-peak
   - Pre-pull common Docker images via user-data to speed first-job
8. **For monitoring**:
   - Fleeting plugin emits Prometheus metrics
   - Per-instance lifecycle events
   - Capacity utilization dashboard

Mark DESTRUCTIVE: scaling down idle VMs aggressively (jobs queue waiting for cold-start), spot instances for long-running critical jobs (reclaim mid-job), modifying launch template while jobs running (next-replaced VM uses new config).

---

Cloud provider: [AWS / GCP / Azure / K8s]
Current setup: [docker-machine legacy / fleeting / new]
Workload pattern: [DESCRIBE — peak/off-peak, concurrency]
Goal: [install / migrate / tune cost / tune performance]

Why this prompt works

Runner autoscaling has changed significantly with fleeting. Many teams still run docker-machine and don’t realize it’s deprecated. This prompt walks the modern setup.

How to use it

  1. Install fleeting on a runner host (not the workers).
  2. Configure plugin per cloud.
  3. Tune capacity based on observed peaks.
  4. Monitor via Prometheus.

Useful commands

# Install fleeting plugin
sudo apt install gitlab-runner

# Install AWS plugin
sudo gitlab-runner exec fleeting-plugin-aws --version    # or similar; check docs

# Verify
gitlab-runner --version
gitlab-runner verify

# Plugin binaries (depending on OS package)
ls /usr/local/bin/fleeting-plugin-*

# Check pre-deprecation usage
grep -r "MachineDriver" /etc/gitlab-runner/config.toml

Config patterns

AWS fleeting

[[runners]]
  name = "aws-autoscaler"
  url = "https://gitlab.example.com"
  token = "TOKEN"
  executor = "docker-autoscaler"

  [runners.docker]
    image = "alpine:latest"

  [runners.autoscaler]
    plugin = "aws"
    capacity_per_instance = 4
    max_use_count = 10
    max_instances = 50

    [runners.autoscaler.plugin_config]
      name             = "gitlab-runner-pool"     # ASG name
      region           = "us-east-1"

    [[runners.autoscaler.policy]]
      idle_count = 2
      idle_time = "20m"
      periods = ["* * 9-17 * * mon-fri *"]   # peak hours

    [[runners.autoscaler.policy]]
      idle_count = 0
      idle_time = "5m"                        # off-peak: no idle

GCP fleeting

[[runners]]
  [runners.autoscaler]
    plugin = "googlecloud"
    capacity_per_instance = 4
    max_use_count = 10
    max_instances = 50

    [runners.autoscaler.plugin_config]
      project          = "my-project"
      instance_group   = "gitlab-runner-mig"
      zone             = "us-central1-a"

Static (no autoscaling, for fixed worker pool)

[[runners]]
  [runners.autoscaler]
    plugin = "static"
    capacity_per_instance = 1

    [runners.autoscaler.plugin_config]
      instances = ["host1.example.com:22", "host2.example.com:22"]

ASG / Launch Template (AWS)

# Terraform / CloudFormation for the launch template
LaunchTemplate:
  Type: AWS::EC2::LaunchTemplate
  Properties:
    LaunchTemplateName: gitlab-runner-workers
    LaunchTemplateData:
      ImageId: ami-XXX                  # Ubuntu 22.04 with Docker pre-installed
      InstanceType: m6a.large
      IamInstanceProfile:
        Name: gitlab-runner-worker
      UserData: !Base64 |
        #!/bin/bash
        # Pre-pull common images for faster first-job
        docker pull node:20-alpine
        docker pull python:3.12-slim
        # Configure runner OS

AutoScalingGroup:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    AutoScalingGroupName: gitlab-runner-pool
    MinSize: 0
    MaxSize: 50
    DesiredCapacity: 2
    HealthCheckType: EC2
    LaunchTemplate:
      LaunchTemplateName: gitlab-runner-workers
      Version: $Latest
    MixedInstancesPolicy:
      LaunchTemplate:
        LaunchTemplateSpecification:
          LaunchTemplateName: gitlab-runner-workers
          Version: $Latest
        Overrides:
        - InstanceType: m6a.large
        - InstanceType: m5a.large
        - InstanceType: m6i.large
      InstancesDistribution:
        OnDemandBaseCapacity: 1               # always keep 1 on-demand
        OnDemandPercentageAboveBaseCapacity: 25
        SpotAllocationStrategy: capacity-optimized

Migration from docker-machine

# OLD: docker-machine
[runners.machine]
  IdleCount = 2
  MachineDriver = "amazonec2"
  MachineName = "gitlab-runner-%s"
  MachineOptions = ["amazonec2-region=us-east-1", ...]

# NEW: fleeting (replace with [runners.autoscaler])
[runners.autoscaler]
  plugin = "aws"
  capacity_per_instance = 4
  max_use_count = 10
  max_instances = 50
  [runners.autoscaler.plugin_config]
    name   = "gitlab-runner-pool"
    region = "us-east-1"
  [[runners.autoscaler.policy]]
    idle_count = 2

Common findings this catches

  • Jobs queue every Monday morningidle.count too low for peak; add scheduled policy.
  • Spot reclaim killing jobs mid-run → use on-demand for critical, spot for retryable.
  • Cold-start every jobmax_use_count: 1 is over-conservative; raise to 5-10.
  • Idle VMs costing money off-hours → time-window policy: zero idle outside business hours.
  • First-job-of-day slow due to image pull → pre-pull common images in user-data.
  • ASG max reached but jobs still queue → raise max; capacity plan.
  • Worker host failing health checks → user-data error; check CloudWatch / instance logs.

When to escalate

  • Cloud capacity unavailable (instance types out of stock in AZ) — coordinate with cloud team.
  • Spot pricing spike — fallback to on-demand or scale down.
  • Persistent autoscaler errors — engage fleeting plugin maintainers (open issue with logs).

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week