Backup-as-a-Service with OpenStack Freezer and AI

Backups are the part of cloud operations everyone agrees is important and nobody enjoys, right up until the day you need a restore — and then it is the only thing that matters. On OpenStack, ad-hoc volume snapshots and hand-rolled rsync scripts are how most teams start, and it is also how most teams discover, mid-disaster, that their backups were incomplete. OpenStack Freezer is the backup-as-a-service project built to replace that: scheduled, multi-tenant, deduplicated backups with a real restore path, storing data in Swift, S3, or local media.

Freezer’s job definitions and restore semantics are exactly the kind of thing where a small mistake is catastrophic — a wrong restore target overwrites live data. So I keep an AI assistant firmly in the fast-junior-engineer role: it drafts job JSON and restore commands, I verify every path and target, and it never runs a restore against anything real.

Confirming Freezer Is Running

Freezer has an API, a scheduler, and agents on the nodes being backed up. Confirm the API and registered clients first:

openstack backup client list
openstack backup job list

If client list is empty, no Freezer agents have registered — the scheduler has nothing to drive. That is an agent-config problem on the target nodes, not an API issue, and it is the most common reason “Freezer is not backing up.”

Defining a Backup Job

A Freezer job is a JSON document describing what to back up, where to store it, and on what schedule. A filesystem backup to Swift looks roughly like:

openstack backup job create \
  --file backup-job.json

with backup-job.json containing the action — source path, storage backend, container, compression, and schedule. This JSON is dense and unforgiving, and it is the first thing I hand to AI. I describe “a nightly incremental backup of /var/lib/mysql to the Swift container db-backups, with seven-day retention,” and the model produces structured job JSON with the right max_level, storage, and container keys.

Then I verify, carefully, because the source path and the retention are the two fields where a mistake hurts. A wrong source path silently backs up nothing; a wrong retention quietly deletes your history.

Pro Tip: Always set and double-check the retention/remove_older_than field by hand, no matter how confident the AI is. The failure mode that ends careers is not a backup that does not run — it is a backup that ran for months while silently pruning the very restore points you needed.

Listing and Verifying Backups

A backup you have never restored is a hypothesis, not a backup. List what Freezer has actually stored:

openstack backup session list
openstack backup job show <job-id>

The session list shows completed runs. I periodically pull this and ask Claude to sanity-check the cadence — “are there any gaps in this nightly schedule over the last month?” Spotting a missing night in a list of timestamps is tedious for me and trivial for the model.

Incremental Backups and Levels

Full backups every night are simple but expensive in storage and time. Freezer supports incremental backups via levels — a periodic full backup (level 0) followed by incrementals that capture only what changed. The max_level field controls how many incrementals chain before a new full is forced. Getting this right is the difference between a backup strategy that fits your storage budget and one that quietly fills Swift.

The tradeoff is restore complexity: restoring a level-5 incremental requires the full plus every incremental in between, so a longer chain means a slower, more fragile restore. I describe my desired cadence — “weekly full, daily incrementals” — and the AI sets max_level accordingly in the job JSON. Then I think carefully about the restore implications myself, because the model optimizes for the cadence I asked for, not for how painful the restore will be at 3 a.m. A shorter chain costs more storage but restores faster and breaks less often, and that is an operational judgment I own rather than delegate. The model handles the arithmetic; I handle the risk.

Running a Restore Carefully

This is the dangerous command, so it gets the most human attention. A restore points a stored backup at a target path:

openstack backup job create --file restore-job.json

where the restore job specifies restore_abs_path. I have AI draft this JSON from the backup metadata, but I read the restore target three times before running it, and I always restore to a staging path first, never directly over live data. The model can generate a perfectly valid restore job aimed at exactly the wrong directory — valid syntax, catastrophic outcome.

When a restore is part of a real disaster, I run the whole operation through my incident response dashboard so every step is logged while the pressure is on.

Testing Restores on a Schedule

The discipline that separates real backups from theater is regular restore drills. I have AI draft a wrapper that picks a recent backup, restores it to a throwaway volume, mounts it, and checks a known file exists — then tears the volume down. That script is reviewable code, so it goes through my code review dashboard before it joins my monthly drill. A restore you test monthly is a restore you can trust at 3 a.m.

Guardrails

Freezer has the highest stakes of any project on this list, because both its failures (silent missing backups) and its successes (a restore over live data) can destroy information. My rules are strict:

The AI drafts backup and restore JSON; it never holds production credentials and never runs a restore.
Restores always target a staging path first, verified by a human, before anything touches production data.
Retention fields are reviewed by hand on every job, every time.

My vetted Freezer prompts live in the prompt workspace, the reusable templates are in the OpenStack prompt pack, and I edit the job JSON in Cursor with GitHub Copilot handling the repetitive fields.

The Takeaway

Freezer turns backups from a pile of fragile scripts into a scheduled, restorable, multi-tenant service — and an AI assistant takes the pain out of its dense JSON. But this is the one project where I am most insistent on the guardrails: the model drafts, the human verifies the source and retention, and restores always land in staging first. Get that discipline right and you will have backups that actually save you, not backups that disappoint you.

The mental model I hold onto is that backups are a restore service, not a backup service. Nobody cares that data was copied somewhere; they care that it comes back, intact, when everything else has failed. Freezer gives you the scheduled copying and the restore path, an AI assistant makes the configuration fast, but the part that actually matters — the monthly restore drill, the verified retention, the staging-first discipline — stays firmly human. That is not a limitation of the AI; it is the right division of labor for the one system whose failure you can never undo.

If you want a backup-and-restore strategy you can genuinely trust on your OpenStack cloud, work with me, or keep reading across the OpenStack category and the prompt library.