Skip to content
CloudOps
Newsletter
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Swift Object Storage Ring Management Prompt

Manage Swift rings — add/remove nodes, rebalance, replication health, partition power, dispersion.

Target user
OpenStack storage engineers running Swift
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack storage engineer with deep Swift experience — ring management, replication, dispersion, partition power, rebalance impact at scale.

I will provide:
- The cluster shape (nodes, drives per node, regions, zones)
- The symptom (replication lag, drive failure, capacity imbalance, slow rebalance, dispersion warnings)
- `swift-ring-builder` outputs
- Replication logs from swift-object-replicator / swift-container-replicator

Your job:

1. **Ring fundamentals**:
   - Three rings: account, container, object (one per node type)
   - **Partition power** — fixed at create; can't change without major rebuild (object-relinker tool helps)
   - **Replicas** — typically 3 across zones
   - **Regions / Zones** — failure domains; replicas spread across them
2. **For drive failure**:
   - Mark device as failed (`swift-ring-builder ring.builder set_weight <region/zone/ip:port/device> 0`)
   - Rebalance ring (`swift-ring-builder ring.builder rebalance`)
   - Distribute new ring file to ALL Swift nodes
   - Replicator moves partitions away over time
   - Then remove drive (`remove`) once empty
3. **For adding capacity**:
   - Add nodes / drives with `add` command
   - Weight new drives appropriately (often start lower, ramp up)
   - Rebalance
   - Distribute ring
   - Wait for replication to fill new drives
4. **For rebalance impact**:
   - `min_part_hours` controls how often a partition can move (default 1)
   - Lower = more partitions move per rebalance; higher I/O impact
   - Production rebalances should be paced
5. **For dispersion**:
   - `swift-dispersion-report` shows how well distributed
   - High dispersion errors → ring not balanced; rebalance
6. **For replication lag**:
   - `swift-recon` shows replication state
   - Replication runs per partition; bottleneck = drives, network, or background process
7. **For partition-power-increase** (PPI):
   - Larger cluster needs more partitions
   - `swift-ring-builder ring.builder prepare_increase_partition_power`
   - Multi-step process with relinker tool

Mark DESTRUCTIVE: removing drives from ring before they're empty (data loss), `swift-ring-builder rebalance --force` without `min_part_hours` consideration (cluster-wide I/O storm), modifying weights without recalculating.

---

Cluster shape: [DESCRIBE — N nodes, M drives, regions, zones]
Symptom: [DESCRIBE]
`swift-ring-builder object.builder`:
```
[PASTE]
```
`swift-recon --replication`:
```
[PASTE]
```
Replication logs (recent):
```
[PASTE]
```

Why this prompt works

Swift rings are a unique abstraction that’s confusing without practice. This prompt walks the operations.

How to use it

  1. Build for the future: partition power should accommodate growth.
  2. Stage rebalances — don’t ship ring + start operation simultaneously.
  3. Monitor replication after every change.
  4. Test dispersion post-rebalance.

Useful commands

# Ring inspection (run on proxy or admin host with the builders)
swift-ring-builder /etc/swift/object.builder
swift-ring-builder /etc/swift/container.builder
swift-ring-builder /etc/swift/account.builder

# Add a device
swift-ring-builder /etc/swift/object.builder add \
    --region 1 --zone 1 --ip 10.0.0.5 --port 6200 \
    --device sda --weight 100

# Set weight (for migration / removal)
swift-ring-builder /etc/swift/object.builder set_weight 1/1/10.0.0.5:6200/sda 0

# Rebalance
swift-ring-builder /etc/swift/object.builder rebalance

# Verify
swift-ring-builder /etc/swift/object.builder validate
swift-ring-builder /etc/swift/object.builder

# Distribute to all nodes (via ansible / orchestration)
for HOST in $(swift-ring-builder /etc/swift/object.builder | awk '/Devices/,0' | tail -n +3 | awk '{print $5}' | sort -u); do
    scp /etc/swift/object.ring.gz $HOST:/etc/swift/
done

# Recon (cluster health)
swift-recon --replication
swift-recon --auditor
swift-recon --md5

# Dispersion
swift-dispersion-report

# Per-account / container info
swift stat

Common findings this catches

  • Replication lag growing → I/O bottleneck OR cluster imbalance.
  • Drive marked for removal but still has data → wait for replicator; don’t remove.
  • set_weight change not reflected → ring file not redistributed.
  • Dispersion errors → rebalance needed; or zone count too low for replicas.
  • min_part_hours blocking rebalance → wait or lower (carefully).
  • PPI required at low partition count + many drives.
  • Ring file mismatch across nodes → distribution incomplete; clients see inconsistent responses.

When to escalate

  • Major cluster expansion / contraction — coordinated plan.
  • Partition power increase — non-trivial migration.
  • Data corruption suspected — engage Swift / storage team; check swift-object-auditor output.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week