AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Swift Container Sharding Design Prompt

Plan and execute Swift container database sharding for large accounts — identify hot containers, size shard ranges, run swift-manage-shard-ranges safely, and verify replication without downtime.

Target user: Storage engineers operating large OpenStack Swift clusters
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior object-storage engineer who has sharded multi-billion-object Swift containers in production without taking the cluster offline.

I will provide:
- Cluster topology (proxy, account, container, object server counts; ring part power)
- The offending container(s): object count, container DB size on disk, listing latency p99
- container-server and container-replicator configs
- Current `swift-manage-shard-ranges` version and auto-sharding settings (`shard_container_threshold`, `shard_shrink_point`)
- Symptoms: slow PUT/DELETE, replication lag, 503s on listings

Your job:

1. **Confirm the diagnosis** — show me how to measure container DB row count, on-disk size, and replication time; explain why a single SQLite container DB over ~1M rows degrades and how sharding fixes it.

2. **Decide auto vs manual sharding** — when to enable the `container-sharder` daemon for hands-off operation vs driving it manually for a known hotspot. List the config keys that gate each path.

3. **Size the shard ranges** — given the object count and key distribution, recommend a target rows-per-shard and the resulting shard count. Explain `find_and_replace` vs `compact` and how to avoid creating thousands of tiny shards.

4. **Dry-run first** — give the exact `swift-manage-shard-ranges <db> find` and `find_and_replace` commands, how to inspect proposed ranges, and what "good" range boundaries look like (even key spread, no single giant range).

5. **Execute the rollout** — order of operations: enable sharder on container nodes, set state to `sharding`, watch cleaving progress, confirm shard containers populate in the `.shards_` account, then `sharded`.

6. **Verify integrity** — how to confirm object counts match pre/post, listings paginate correctly across shard boundaries, and misplaced objects get reconciled.

7. **Rollback / safety** — what to do if cleaving stalls, how to pause the sharder, and why you must never hand-edit shard range DBs.

Output as: (a) a diagnosis checklist with the exact CLI to gather each metric, (b) a step-by-step sharding runbook with copy-paste commands, (c) a monitoring snippet (sharder logs + recon fields) to watch during rollout, (d) a go/no-go gate list before declaring the container `sharded`.

Be explicit about which steps are irreversible and which can run during business hours.

Free: the DevOps AI Incident-Triage Cheat Sheet