Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Automation By James Joyner IV · · 10 min read

Rundeck Job-as-Code: A Version-Controlled Operations Library

Turn ad-hoc Rundeck clicking into a reviewed, version-controlled job library with scoped access, dry-run options, and audit — using AI to draft job definitions you review.

  • #automation
  • #ai
  • #rundeck
  • #runbook
  • #operations

Rundeck starts as a gift and quietly becomes a liability. The gift is obvious: a web UI where operators run operational tasks without SSHing into boxes, with output captured and access controlled. The liability creeps in as jobs get created by clicking through the UI, tweaked in place, and never reviewed. Eighteen months later you have two hundred jobs, no record of who changed what, three nearly-identical “restart the service” jobs with subtly different behavior, and a job called cleanup-temp that nobody is quite sure is safe to run. The UI that made operations easy made them ungoverned.

Job-as-code reverses this. Rundeck job definitions are exportable as YAML, which means they can live in git, go through pull-request review, and deploy through a pipeline like any other code. The operators still get their friendly UI; the jobs behind it become reviewed, versioned artifacts. AI is a natural fit for drafting these definitions — the YAML is structured and repetitive — while you keep judgment over what each job is allowed to do.

Define the Job in Version Control

A Rundeck job exported to YAML is reviewable in a way a UI-clicked job never is. Here’s a restart job with the guardrails that matter:

- name: restart-service
  group: ops/maintenance
  description: "Gracefully restart a service on selected nodes"
  options:
    - name: service
      required: true
      values: [checkout, payments, search]      # allowlist, not free text
    - name: dry_run
      type: boolean
      default: true                               # safe by default
  nodefilters:
    filter: "tags: app-tier"
  sequence:
    keepgoing: false                              # stop on first failure
    commands:
      - script: |
          set -euo pipefail
          if [ "@option.dry_run@" = "true" ]; then
            echo "DRY RUN: would restart @option.service@ on $(hostname)"
            exit 0
          fi
          systemctl reload-or-restart "@option.service@"

Several guardrails are doing real work. The service option is an allowlist of known services, not a free-text field that could be made to target something unintended. dry_run defaults to true, so the safe action is the default and the destructive one is a deliberate choice — the same default-safe principle behind approval-gated automation guardrails. And keepgoing: false stops the job on the first node that fails rather than plowing through the rest of the fleet. When a model drafts a job for you, these are the fields to verify, because the naive draft tends to use free-text options and omit the dry-run path.

Scope Access With ACL Policies

A job-as-code library is only as safe as the access policy around it. Rundeck ACL policies are themselves YAML, so they version alongside the jobs. The principle is least privilege per job group: read-only diagnostics open to a broad group, destructive maintenance restricted to a narrow one.

description: "ops-readonly may run diagnostics, not maintenance"
context:
  project: production
for:
  job:
    - match:
        group: ops/diagnostics
      allow: [run, read]
    - match:
        group: ops/maintenance
      allow: [read]                               # can see, cannot run
by:
  group: ops-readonly

Splitting jobs into ops/diagnostics and ops/maintenance groups lets you grant “look but don’t touch” cleanly. This mirrors the broader credential scoping and least-privilege discipline: the job runs with Rundeck’s access to the nodes, so who can trigger it is the real control.

Prompt: “Here is a runbook for safely draining and restarting a node. Draft a Rundeck job definition in YAML with a dry-run option defaulting to true, an allowlisted service option, keepgoing false, and a node filter by tag. Then draft an ACL policy granting run access only to an ops-oncall group and read-only to everyone else. Flag any step that is irreversible and should require a confirmation option.”

What it returns: a job YAML with the safe defaults, a scoped ACL, and a callout on irreversible steps — the flag being the useful part, since it surfaces where a job needs a second guardrail you might otherwise skip.

Consolidate the Duplicates

Once jobs live in git, the three near-identical restart jobs become visible and embarrassing in a way they never were in the UI. This is where AI earns its keep on cleanup: point it at the exported library and ask it to find jobs that overlap, differ only in parameters, or duplicate each other’s logic. It produces a credible consolidation map — “these four are the same job with a different service value; replace with one parameterized job.” You review the map, because the model can’t know that two superficially identical jobs are deliberately separate for an access-control reason. Consolidation reduces the surface area you have to reason about, which is the whole point of treating jobs as a library rather than a pile.

Verify in a Non-Production Project First

Job-as-code means jobs deploy through a pipeline, so verify them there. Run a new or changed job in a non-production Rundeck project with dry_run true and confirm it reports the right intended actions without doing them. Then run it for real against a single throwaway node and confirm the output matches. Only then promote it to the production project. Because the job definition is in git, this is a normal code-review-and-deploy flow, not a hope-it-works click in production.

The collaboration pattern is consistent with the rest of AI for Automation: the model drafts job definitions, ACL policies, and consolidation maps faster than hand-editing YAML, while you own the safety decisions — the allowlists, the dry-run defaults, the access scoping, and which jobs are too dangerous to run without a human. For the design checklist, see the Rundeck job-as-code prompt.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.