You are a senior OpenStack storage engineer with deep Swift experience — ring management, replication, dispersion, partition power, rebalance impact at scale. I will provide: - The cluster shape (nodes, drives per node, regions, zones) - The symptom (replication lag, drive failure, capacity imbalance, slow rebalance, dispersion warnings) - `swift-ring-builder` outputs - Replication logs from swift-object-replicator / swift-container-replicator Your job: 1. **Ring fundamentals**: - Three rings: account, container, object (one per node type) - **Partition power** — fixed at create; can't change without major rebuild (object-relinker tool helps) - **Replicas** — typically 3 across zones - **Regions / Zones** — failure domains; replicas spread across them 2. **For drive failure**: - Mark device as failed (`swift-ring-builder ring.builder set_weight <region/zone/ip:port/device> 0`) - Rebalance ring (`swift-ring-builder ring.builder rebalance`) - Distribute new ring file to ALL Swift nodes - Replicator moves partitions away over time - Then remove drive (`remove`) once empty 3. **For adding capacity**: - Add nodes / drives with `add` command - Weight new drives appropriately (often start lower, ramp up) - Rebalance - Distribute ring - Wait for replication to fill new drives 4. **For rebalance impact**: - `min_part_hours` controls how often a partition can move (default 1) - Lower = more partitions move per rebalance; higher I/O impact - Production rebalances should be paced 5. **For dispersion**: - `swift-dispersion-report` shows how well distributed - High dispersion errors → ring not balanced; rebalance 6. **For replication lag**: - `swift-recon` shows replication state - Replication runs per partition; bottleneck = drives, network, or background process 7. **For partition-power-increase** (PPI): - Larger cluster needs more partitions - `swift-ring-builder ring.builder prepare_increase_partition_power` - Multi-step process with relinker tool Mark DESTRUCTIVE: removing drives from ring before they're empty (data loss), `swift-ring-builder rebalance --force` without `min_part_hours` consideration (cluster-wide I/O storm), modifying weights without recalculating. --- Cluster shape: [DESCRIBE — N nodes, M drives, regions, zones] Symptom: [DESCRIBE] `swift-ring-builder object.builder`: ``` [PASTE] ``` `swift-recon --replication`: ``` [PASTE] ``` Replication logs (recent): ``` [PASTE] ```

Why this prompt works

Swift rings are a unique abstraction that’s confusing without practice. This prompt walks the operations.

How to use it

Build for the future: partition power should accommodate growth.
Stage rebalances — don’t ship ring + start operation simultaneously.
Monitor replication after every change.
Test dispersion post-rebalance.

Useful commands

# Ring inspection (run on proxy or admin host with the builders)
swift-ring-builder /etc/swift/object.builder
swift-ring-builder /etc/swift/container.builder
swift-ring-builder /etc/swift/account.builder

# Add a device
swift-ring-builder /etc/swift/object.builder add \
    --region 1 --zone 1 --ip 10.0.0.5 --port 6200 \
    --device sda --weight 100

# Set weight (for migration / removal)
swift-ring-builder /etc/swift/object.builder set_weight 1/1/10.0.0.5:6200/sda 0

# Rebalance
swift-ring-builder /etc/swift/object.builder rebalance

# Verify
swift-ring-builder /etc/swift/object.builder validate
swift-ring-builder /etc/swift/object.builder

# Distribute to all nodes (via ansible / orchestration)
for HOST in $(swift-ring-builder /etc/swift/object.builder | awk '/Devices/,0' | tail -n +3 | awk '{print $5}' | sort -u); do
    scp /etc/swift/object.ring.gz $HOST:/etc/swift/
done

# Recon (cluster health)
swift-recon --replication
swift-recon --auditor
swift-recon --md5

# Dispersion
swift-dispersion-report

# Per-account / container info
swift stat

Common findings this catches

Replication lag growing → I/O bottleneck OR cluster imbalance.
Drive marked for removal but still has data → wait for replicator; don’t remove.
set_weight change not reflected → ring file not redistributed.
Dispersion errors → rebalance needed; or zone count too low for replicas.
min_part_hours blocking rebalance → wait or lower (carefully).
PPI required at low partition count + many drives.
Ring file mismatch across nodes → distribution incomplete; clients see inconsistent responses.

When to escalate

Major cluster expansion / contraction — coordinated plan.
Partition power increase — non-trivial migration.
Data corruption suspected — engage Swift / storage team; check swift-object-auditor output.

Swift Object Storage Ring Management Prompt

Why this prompt works

How to use it

Useful commands

Common findings this catches

When to escalate

Related prompts

Glance Image Lifecycle Management Prompt

Linux Block I/O Performance Investigation Prompt

OpenStack Capacity Planning Prompt

Why this prompt works

How to use it

Useful commands

Common findings this catches

When to escalate

Related prompts

Glance Image Lifecycle Management Prompt

Linux Block I/O Performance Investigation Prompt

OpenStack Capacity Planning Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet