Autoscaling GitLab Runners With Fleeting on AWS Spot

For years, autoscaling GitLab runners meant Docker Machine — a tool GitLab forked and kept on life support long after upstream abandoned it. If you ever debugged a wedged docker-machine state file at 1am, you remember the pain. That era is over. The modern answer is Fleeting, GitLab’s autoscaling abstraction with provider plugins for AWS, GCP, and Azure. It’s a cleaner model, and paired with spot instances it makes elastic CI genuinely cheap.

Here’s how I set it up and the gotchas that actually matter.

What Fleeting changes

Fleeting separates two concerns that Docker Machine smashed together. A plugin (e.g. fleeting-plugin-aws) knows how to talk to a cloud autoscaling group and produce a list of instances. The taskscaler inside GitLab Runner decides how many instances you need and how many jobs to pack onto each. The runner SSHes into those instances and runs jobs with the docker-autoscaler or instance executor.

The practical upside: the cloud provider’s autoscaling group owns instance lifecycle. No more bespoke state machine in the runner. When an instance dies, the ASG replaces it; the runner just sees the new list.

The AWS pieces you need first

Before touching runner config, stand up the AWS side:

An Auto Scaling Group with min size 0, a sane max, and a launch template using a CI-friendly AMI (Docker pre-installed saves boot time).
A mixed instances policy so the ASG can pull spot capacity across several instance types — diversification is what keeps spot from getting starved.
An IAM role for the runner manager allowing it to describe and set the ASG’s desired capacity.

Keep the ASG’s own scaling policies off. Fleeting drives desired capacity directly; competing scaling policies will fight it.

The runner config

Install the plugin alongside the runner binary, then configure the autoscaler block:

concurrent = 40
check_interval = 3

[[runners]]
  name = "fleeting-spot"
  url = "https://gitlab.com/"
  token = "glrt-REDACTED"
  executor = "docker-autoscaler"

  [runners.docker]
    image = "alpine:3.20"

  [runners.autoscaler]
    plugin = "aws"
    capacity_per_instance = 4
    max_use_count = 20
    max_instances = 10

    [runners.autoscaler.plugin_config]
      name = "ci-runners-asg"
      region = "us-east-1"

    [[runners.autoscaler.policy]]
      idle_count = 2
      idle_time = "20m0s"

The knobs that matter:

capacity_per_instance — how many jobs each instance runs concurrently. Set it to what one instance can actually handle given your job sizes, not what looks efficient on paper.
max_use_count — retire an instance after it’s run this many jobs. This is your defense against state leaking between builds on a reused instance. I keep it modest.
idle_count / idle_time — keep this many warm instances around for idle_time so the first jobs of the morning don’t eat a cold-boot delay.

Make spot interruptions a non-event

Spot is cheap because AWS can reclaim it with two minutes’ notice. The goal is for an interruption to cost you a retried job, not a corrupted pipeline.

First, lean on instance-type diversification in the ASG so a single capacity pool draining doesn’t take you to zero. Second, make jobs idempotent and retryable so a killed instance just means GitLab reschedules:

build:
  script: ./build.sh
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

runner_system_failure is exactly the class of error a spot reclaim produces. Retrying it automatically turns most interruptions invisible. Don’t blanket-retry on script_failure — that just reruns genuinely broken builds and hides bugs.

Right-size with idle scaling schedules

Most teams have a daily rhythm: quiet overnight, busy from mid-morning. You can shape warm capacity to match instead of paying for idle instances at 3am:

    [[runners.autoscaler.policy]]
      periods = ["* 9-18 * * mon-fri"]
      idle_count = 4
      idle_time = "30m0s"

    [[runners.autoscaler.policy]]
      periods = ["* * * * *"]
      idle_count = 0
      idle_time = "10m0s"

Warm pool during business hours, scale-to-zero otherwise. This is where the cost savings get real — you’re paying for capacity only when developers are actually pushing code.

The gotchas I hit

A few things that cost me time so they don’t cost you yours:

AMI boot time is your floor. If your AMI takes three minutes to be SSH-ready, idle_count = 0 means every cold start pays that. Bake Docker and your common base images into the AMI.
Disk fills up on reused instances. Layers and caches accumulate. A modest max_use_count plus a periodic docker system prune in a cleanup job keeps disk from becoming the silent failure.
IAM scope creep. The runner only needs to describe and scale its ASG. Don’t hand it broad EC2 permissions because the example did.
Don’t overcommit concurrent. It must be greater than or equal to capacity_per_instance * max_instances, or you cap yourself below the capacity you’re paying for.

Where to go from here

Fleeting is a real improvement over the Docker Machine era: the cloud owns instance lifecycle, and you get clean knobs for capacity, reuse, and idle scaling. Pair it with diversified spot and retryable jobs and you get CI that’s both elastic and cheap, without the 1am state-file archaeology.

For more on runner architecture and keeping CI costs down, see our GitLab CI/CD guides. And when you’re reviewing changes to runner or IAM config, our AI code review assistant helps catch over-broad permissions before they ship.

Cloud autoscaling behavior and pricing vary by region and account. Test scaling policies against your real traffic before relying on them.

Autoscaling GitLab Runners With Fleeting on AWS Spot Instances