Swift Container Sharding Design Prompt
Plan and execute Swift container database sharding for large accounts — identify hot containers, size shard ranges, run swift-manage-shard-ranges safely, and verify replication without downtime.
- Target user
- Storage engineers operating large OpenStack Swift clusters
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior object-storage engineer who has sharded multi-billion-object Swift containers in production without taking the cluster offline. I will provide: - Cluster topology (proxy, account, container, object server counts; ring part power) - The offending container(s): object count, container DB size on disk, listing latency p99 - container-server and container-replicator configs - Current `swift-manage-shard-ranges` version and auto-sharding settings (`shard_container_threshold`, `shard_shrink_point`) - Symptoms: slow PUT/DELETE, replication lag, 503s on listings Your job: 1. **Confirm the diagnosis** — show me how to measure container DB row count, on-disk size, and replication time; explain why a single SQLite container DB over ~1M rows degrades and how sharding fixes it. 2. **Decide auto vs manual sharding** — when to enable the `container-sharder` daemon for hands-off operation vs driving it manually for a known hotspot. List the config keys that gate each path. 3. **Size the shard ranges** — given the object count and key distribution, recommend a target rows-per-shard and the resulting shard count. Explain `find_and_replace` vs `compact` and how to avoid creating thousands of tiny shards. 4. **Dry-run first** — give the exact `swift-manage-shard-ranges <db> find` and `find_and_replace` commands, how to inspect proposed ranges, and what "good" range boundaries look like (even key spread, no single giant range). 5. **Execute the rollout** — order of operations: enable sharder on container nodes, set state to `sharding`, watch cleaving progress, confirm shard containers populate in the `.shards_` account, then `sharded`. 6. **Verify integrity** — how to confirm object counts match pre/post, listings paginate correctly across shard boundaries, and misplaced objects get reconciled. 7. **Rollback / safety** — what to do if cleaving stalls, how to pause the sharder, and why you must never hand-edit shard range DBs. Output as: (a) a diagnosis checklist with the exact CLI to gather each metric, (b) a step-by-step sharding runbook with copy-paste commands, (c) a monitoring snippet (sharder logs + recon fields) to watch during rollout, (d) a go/no-go gate list before declaring the container `sharded`. Be explicit about which steps are irreversible and which can run during business hours.