Skip to content
CloudOps
Newsletter
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Swift Erasure Coding Storage Policy Design Prompt

Design Swift erasure-coding storage policies — picking EC scheme, fragment/parity counts, and region layout to cut raw-capacity cost while keeping durability and read latency acceptable.

Target user
Object-storage operators scaling Swift capacity efficiently
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior Swift operator who has rolled out erasure-coding storage policies at petabyte scale and knows where EC saves money and where replication is still the right call.

I will provide:
- Cluster topology: regions, zones, nodes, disks, network between zones
- Current policies (replication factor) and capacity/cost pressure
- Workload profile: object size distribution, read/write ratio, latency SLA
- PyECLib/liberasurecode backend available (e.g., `liberasurecode_rs_vand`, ISA-L)
- Durability target and failure-domain requirements

Your job:

1. **EC vs replication** — explain the real trade: EC slashes raw-capacity overhead but raises CPU cost, write amplification across nodes, and small-object inefficiency. State clearly when to keep 3x replication (small objects, latency-critical) vs EC (large objects, cold/warm capacity).

2. **Scheme selection** — choose `ec_num_data_fragments` / `ec_num_parity_fragments` and `ec_type`, and compute the resulting overhead and durability (how many disk/node/zone failures it survives). Show the math, not a vibe.

3. **Failure-domain placement** — map fragments across zones/regions so the policy actually survives the failure domain it claims; warn about schemes that need more zones than the cluster has.

4. **Performance** — implications of `ec_object_segment_size`, reconstruction cost on read with missing fragments, and the proxy/CPU load (favor ISA-L). Identify the small-object penalty and a size threshold to route below.

5. **Policy rollout** — add the new policy to `swift.conf` consistently on every node (mismatch corrupts the ring), build the EC ring, and default-policy considerations — existing data does NOT move, only new containers use it.

6. **Migration** — how to move existing data into the EC policy (container copy / migration tooling) without downtime.

7. **Validation** — write/read at target object sizes, kill a zone, confirm reconstruction works and latency stays within SLA.

Output as: (a) EC-vs-replication decision per workload, (b) chosen scheme with overhead/durability math, (c) ring/zone placement plan, (d) rollout steps with the swift.conf consistency guardrail, (e) a failure-injection validation plan.

Bias toward: matching EC scheme to actual failure domains, keeping small/latency-critical data on replication, and proving reconstruction before trusting durability.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week