Ransomware-Resilient Backups: Immutability and Recovery

The first time I watched a ransomware tabletop go sideways, it wasn’t the encryption that killed us — it was the realization that the same domain admin credential the attacker had stolen could also reach our backup server and our S3 bucket. We had backups. We had a lot of backups. And every single one of them was deletable by the identity that had just been compromised. That afternoon changed how I think about backups entirely: a backup you can delete under duress is not a backup, it’s a liability with a timestamp.

This post is strictly defensive. The goal is to make your backups survive an attacker who already has your credentials, your console, and a very bad attitude. Everything below is something you can build today with restic, borg, and S3 Object Lock.

The 3-2-1-1-0 Rule Is the Whole Game

The classic 3-2-1 rule (three copies, two media types, one offsite) predates ransomware. The modern version adds two critical digits: 3-2-1-1-0.

3 copies of your data
2 different media or storage classes
1 copy offsite
1 copy that is offline or immutable (air-gapped, or write-once-read-many)
0 errors verified through regular restore testing

The last two are the ones that defeat ransomware specifically. An online, mutable replica is not protection — it’s a second target. The “1” immutable copy is what an attacker physically cannot encrypt or delete, and the “0” is the recovery drill that proves the copy actually restores. Most breached organizations had the first three numbers. Almost none had the last two.

S3 Object Lock: Compliance vs Governance Mode

S3 Object Lock gives you WORM (write-once-read-many) storage in the cloud. It must be enabled at bucket creation — you cannot retrofit it onto an existing bucket.

aws s3api create-bucket \
  --bucket acme-immutable-backups \
  --object-lock-enabled-for-bucket \
  --region us-east-1

aws s3api put-object-lock-configuration \
  --bucket acme-immutable-backups \
  --object-lock-configuration '{
    "ObjectLockEnabled": "Enabled",
    "Rule": { "DefaultRetention": { "Mode": "COMPLIANCE", "Days": 30 } }
  }'

The mode choice is the whole security decision:

Governance mode can be overridden by an identity holding s3:BypassGovernanceRetention. It protects against accidents and ordinary users, but a sufficiently privileged attacker (or root) can still bypass it. Useful, but not ransomware-proof on its own.
Compliance mode cannot be bypassed by anyone — not your admins, not your root account, not AWS support — until the retention period expires. This is the real air gap in the cloud.

For ransomware resilience, use Compliance mode for your immutable tier. Yes, it means you genuinely cannot delete those objects early, even if you fat-finger a 7-year retention. Start with a short window (7–30 days) until you trust your tooling.

Pro Tip: Pair Object Lock with versioning (required anyway) and a lifecycle rule that transitions old immutable versions to Glacier. You get cheap, undeletable, long-tail retention without paying standard storage prices for months of locked objects.

restic and borg with Append-Only Repositories

Your backup client should never hold the keys to destroy its own history. Both restic and borg support an append-only mode where the client can add data but cannot prune or delete it — that authority lives elsewhere.

restic against the locked bucket:

export RESTIC_REPOSITORY="s3:s3.amazonaws.com/acme-immutable-backups/prod"
export RESTIC_PASSWORD_FILE="/etc/restic/passphrase"

restic init
restic backup /var/lib/postgresql /etc /home \
  --tag nightly --verbose

With borg, the append-only flag is enforced server-side in authorized_keys, so a compromised client physically cannot issue a destructive command:

# ~/.ssh/authorized_keys on the backup host
command="borg serve --append-only --restrict-to-path /srv/borg/prod",restrict ssh-ed25519 AAAA...client-key

# on the client — this works
borg create /srv/borg/prod::nightly-{now} /var /etc /home

# this is silently rejected by the server in append-only mode
borg prune --keep-daily 7 /srv/borg/prod

Pruning happens from a separate, more privileged path that the production host never touches. The compromised machine can write new backups all day; it can never rewrite the past.

Separate Credentials and Blast-Radius Isolation

This is the single most overlooked control. The identity that writes backups must not be the identity that can delete them, and neither should be reachable from your production blast radius.

A write-only IAM policy for the backup agent — note there is no DeleteObject, no PutObjectRetention downgrade, no BypassGovernanceRetention:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "BackupWriteOnly",
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::acme-immutable-backups",
        "arn:aws:s3:::acme-immutable-backups/*"
      ]
    },
    {
      "Sid": "DenyDestructive",
      "Effect": "Deny",
      "Action": [
        "s3:DeleteObject",
        "s3:DeleteObjectVersion",
        "s3:PutBucketObjectLockConfiguration",
        "s3:BypassGovernanceRetention"
      ],
      "Resource": "*"
    }
  ]
}

Put the backup account in a separate AWS account with its own root, separate from production. Use SCPs to deny s3:PutObjectLockConfiguration and bucket deletion org-wide. Lifecycle and retention management runs from a third, tightly held break-glass role behind MFA. If the attacker owns prod, they own a write-only door into a vault they cannot open.

Immutability Windows and Offline Copies

Match your immutability window to your real-world detection-to-recovery time. If your average dwell time before detection is 11 days (industry median runs higher), a 7-day lock leaves you exposed — attackers who linger can simply wait out a too-short window before triggering. I default to a 30-day Compliance lock for daily backups and a 1-year lock for monthly archives.

For the truly air-gapped “1”, nothing beats media that is physically disconnected:

# rotate an offline LUKS-encrypted drive weekly
cryptsetup open /dev/sdb1 backup-cold
mount /dev/mapper/backup-cold /mnt/cold
borg create /mnt/cold/borg::archive-{now} /srv/data
umount /mnt/cold && cryptsetup close backup-cold
# then physically eject and store the drive — it is now unreachable by any attacker

A drive sitting in a fire safe cannot be encrypted over the network. It feels old-fashioned. It also works when everything else is on fire.

Recovery Drills: The “0” That Everyone Skips

A backup is a hypothesis until you restore it. Untested backups fail at the worst possible moment — wrong path, missing passphrase, corrupted chunk, an exclude rule that quietly skipped your database. Run drills on a schedule, into an isolated environment, and time them.

# verify repository integrity without restoring
restic check --read-data-subset=10%

# full restore into a clean, isolated target and assert
restic restore latest --target /restore-test
diff -r /restore-test/etc /golden/etc && echo "RESTORE OK"

# borg equivalent
borg check --verify-data /srv/borg/prod
borg extract --dry-run /srv/borg/prod::nightly-2026-06-16

Record the Recovery Time Objective you actually hit, not the one in the runbook. Treat a failed drill as a Sev-2 incident. If you want a structured place to run that exercise, our incident-response workspace is built for exactly this kind of timed tabletop, and the monitoring-alerts dashboard can hold the alert rules below.

Monitoring Success and Detecting Mass-Encryption

Two signals matter: backups that stop succeeding, and data that suddenly looks encrypted.

For backup health, fail loud. A backup job that silently stops is how organizations discover, mid-incident, that their last good copy is six weeks old. Emit a metric on every run and alert on absence:

restic backup /data --tag nightly \
  && curl -fsS https://hc-ping.com/$UUID \
  || curl -fsS https://hc-ping.com/$UUID/fail

For mass-encryption detection, watch for the statistical fingerprint of ransomware: a sudden spike in changed bytes, a collapse in deduplication ratio (encrypted data does not dedupe), and rising file entropy. restic exposes the changed-data delta per snapshot:

restic stats latest --mode raw-data
# a nightly delta that is 40x normal with near-zero dedup = investigate now

Alert when a single backup run’s new-data volume exceeds, say, 5x the trailing 30-day median, or when entropy across newly written files jumps toward 8 bits/byte. These are early-warning tripwires — they buy you the hours that decide whether you restore from yesterday or from last quarter.

A note on using AI safely here. I lean on AI heavily to audit this kind of setup — diffing IAM policies against least-privilege, reviewing lifecycle rules, sanity-checking that a Deny statement actually covers DeleteObjectVersion. Treat the model as a fast, tireless junior engineer: excellent at spotting a missing condition key or an over-broad resource glob, and a human verifies every finding before it touches production. Never hand the model real passphrases, IAM secret keys, or restic repository passwords — paste redacted policy and config, not credentials. If you want a starting point, the security-hardening category and our prompts library include review-oriented templates, and the code-review dashboard is wired for exactly this audit-then-verify loop.

Conclusion

Ransomware does not beat backups — it beats deletable backups. Build to 3-2-1-1-0: enough copies, an immutable tier in S3 Compliance mode, append-only restic and borg repos, write-only credentials isolated in a separate account, and at least one cold copy an attacker simply cannot reach. Then prove it with a timed recovery drill every month and tripwires for mass-encryption. Do that, and the worst day becomes a restore instead of a ransom.

Ransomware-Resilient Backups: Immutability and Recovery Drills That Actually Work