Swift Object Storage Ring Management Prompt
Manage Swift rings — add/remove nodes, rebalance, replication health, partition power, dispersion.
- Target user
- OpenStack storage engineers running Swift
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior OpenStack storage engineer with deep Swift experience — ring management, replication, dispersion, partition power, rebalance impact at scale. I will provide: - The cluster shape (nodes, drives per node, regions, zones) - The symptom (replication lag, drive failure, capacity imbalance, slow rebalance, dispersion warnings) - `swift-ring-builder` outputs - Replication logs from swift-object-replicator / swift-container-replicator Your job: 1. **Ring fundamentals**: - Three rings: account, container, object (one per node type) - **Partition power** — fixed at create; can't change without major rebuild (object-relinker tool helps) - **Replicas** — typically 3 across zones - **Regions / Zones** — failure domains; replicas spread across them 2. **For drive failure**: - Mark device as failed (`swift-ring-builder ring.builder set_weight <region/zone/ip:port/device> 0`) - Rebalance ring (`swift-ring-builder ring.builder rebalance`) - Distribute new ring file to ALL Swift nodes - Replicator moves partitions away over time - Then remove drive (`remove`) once empty 3. **For adding capacity**: - Add nodes / drives with `add` command - Weight new drives appropriately (often start lower, ramp up) - Rebalance - Distribute ring - Wait for replication to fill new drives 4. **For rebalance impact**: - `min_part_hours` controls how often a partition can move (default 1) - Lower = more partitions move per rebalance; higher I/O impact - Production rebalances should be paced 5. **For dispersion**: - `swift-dispersion-report` shows how well distributed - High dispersion errors → ring not balanced; rebalance 6. **For replication lag**: - `swift-recon` shows replication state - Replication runs per partition; bottleneck = drives, network, or background process 7. **For partition-power-increase** (PPI): - Larger cluster needs more partitions - `swift-ring-builder ring.builder prepare_increase_partition_power` - Multi-step process with relinker tool Mark DESTRUCTIVE: removing drives from ring before they're empty (data loss), `swift-ring-builder rebalance --force` without `min_part_hours` consideration (cluster-wide I/O storm), modifying weights without recalculating. --- Cluster shape: [DESCRIBE — N nodes, M drives, regions, zones] Symptom: [DESCRIBE] `swift-ring-builder object.builder`: ``` [PASTE] ``` `swift-recon --replication`: ``` [PASTE] ``` Replication logs (recent): ``` [PASTE] ```
Why this prompt works
Swift rings are a unique abstraction that’s confusing without practice. This prompt walks the operations.
How to use it
- Build for the future: partition power should accommodate growth.
- Stage rebalances — don’t ship ring + start operation simultaneously.
- Monitor replication after every change.
- Test dispersion post-rebalance.
Useful commands
# Ring inspection (run on proxy or admin host with the builders)
swift-ring-builder /etc/swift/object.builder
swift-ring-builder /etc/swift/container.builder
swift-ring-builder /etc/swift/account.builder
# Add a device
swift-ring-builder /etc/swift/object.builder add \
--region 1 --zone 1 --ip 10.0.0.5 --port 6200 \
--device sda --weight 100
# Set weight (for migration / removal)
swift-ring-builder /etc/swift/object.builder set_weight 1/1/10.0.0.5:6200/sda 0
# Rebalance
swift-ring-builder /etc/swift/object.builder rebalance
# Verify
swift-ring-builder /etc/swift/object.builder validate
swift-ring-builder /etc/swift/object.builder
# Distribute to all nodes (via ansible / orchestration)
for HOST in $(swift-ring-builder /etc/swift/object.builder | awk '/Devices/,0' | tail -n +3 | awk '{print $5}' | sort -u); do
scp /etc/swift/object.ring.gz $HOST:/etc/swift/
done
# Recon (cluster health)
swift-recon --replication
swift-recon --auditor
swift-recon --md5
# Dispersion
swift-dispersion-report
# Per-account / container info
swift stat
Common findings this catches
- Replication lag growing → I/O bottleneck OR cluster imbalance.
- Drive marked for removal but still has data → wait for replicator; don’t remove.
set_weightchange not reflected → ring file not redistributed.- Dispersion errors → rebalance needed; or zone count too low for replicas.
min_part_hoursblocking rebalance → wait or lower (carefully).- PPI required at low partition count + many drives.
- Ring file mismatch across nodes → distribution incomplete; clients see inconsistent responses.
When to escalate
- Major cluster expansion / contraction — coordinated plan.
- Partition power increase — non-trivial migration.
- Data corruption suspected — engage Swift / storage team; check
swift-object-auditoroutput.
Related prompts
-
Glance Image Lifecycle Management Prompt
Manage Glance images — store backends, image signing, format conversion, image cache, multi-store, deletion-protection.
-
Linux Block I/O Performance Investigation Prompt
Diagnose slow disk I/O, high iowait, queue depth saturation, and storage performance regressions using iostat, blktrace, fio, and per-device metrics.
-
OpenStack Capacity Planning Prompt
Plan OpenStack capacity — CPU/RAM/disk oversubscription, growth modeling, hypervisor sizing, Cinder backend planning, network bandwidth.