AWS Error Guide: '503 SlowDown' and ServiceUnavailable S3

Overview

503 SlowDown (and the related 503 ServiceUnavailable) is S3 telling you to back off: your request rate against a key prefix is climbing faster than the partition behind it can scale. S3 scales request capacity per prefix automatically, but scaling is gradual — a sudden burst against one prefix, or a workload concentrated on a single “hot” key range, outpaces the partition and gets throttled with a retryable 503.

You see it from the CLI or an SDK:

An error occurred (SlowDown) when calling the PutObject operation (reached max retries: 4): Please reduce your request rate.

Or the transient infrastructure variant:

An error occurred (503) when calling the GetObject operation: Service Unavailable

It occurs on PutObject, GetObject, DeleteObject, ListObjectsV2, and multipart operations — most often during bulk ingest, large parallel copies/migrations, or analytics jobs that read thousands of objects under one prefix.

Symptoms

Bulk upload/download jobs intermittently fail with SlowDown / 503 and slow down under load.
The same operation succeeds at low concurrency but fails when parallelized.
CloudWatch S3 request metrics show 5xxErrors rising with request count.
Errors cluster on one prefix while other prefixes are fine.

aws s3 cp ./batch/ s3://data-lake/ingest/2026-06-23/ --recursive

upload failed: ./batch/f8123.json to s3://data-lake/ingest/2026-06-23/f8123.json An error occurred (SlowDown) when calling the PutObject operation: Please reduce your request rate.

aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name 5xxErrors \
  --dimensions Name=BucketName,Value=data-lake Name=FilterId,Value=EntireBucket \
  --start-time "$(date -u -d '30 min ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --period 300 --statistics Sum \
  --query 'Datapoints[].Sum' --output text

0.0	0.0	142.0	318.0

Common Root Causes

1. Burst against a single prefix

All requests target one prefix (e.g. a date folder) faster than S3 can scale that partition. The 5xx error count rises in lockstep with request rate on that prefix.

aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name AllRequests \
  --dimensions Name=BucketName,Value=data-lake Name=FilterId,Value=EntireBucket \
  --start-time "$(date -u -d '30 min ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --period 300 --statistics Sum \
  --query 'Datapoints[].Sum' --output text

1200.0	1180.0	9800.0	14200.0

Request volume spiking to ~14k/period on one prefix outpaces the partition — spread the load across prefixes.

2. Hot key / sequential-prefix design

Keys with a common high-cardinality-last prefix (timestamps, sequential IDs) concentrate writes on one partition. S3 can no longer split the load by leading prefix characters.

aws s3api list-objects-v2 --bucket data-lake --prefix ingest/2026-06-23/ \
  --query 'length(Contents)' --output text

Tens of thousands of objects under one sequential date prefix is a classic hot-prefix pattern. Add a high-entropy prefix segment (e.g. a hash) to distribute.

3. Missing or weak client retries

S3 503s are explicitly retryable with backoff. A client that does not retry (or retries with no backoff) surfaces every transient throttle as a hard failure.

aws configure get retry_mode; aws configure get max_attempts

legacy
3

legacy retry mode has minimal backoff for 503; switch to standard/adaptive and raise max_attempts.

4. Excessive concurrency saturating the prefix

A high parallelism setting (many threads/workers) drives the per-prefix rate past what scaling can absorb. More concurrency on one prefix does not raise the limit — it trips it sooner.

aws configure get s3.max_concurrent_requests

40 concurrent requests all hitting one prefix can overwhelm a fresh partition; lower it or fan out across prefixes.

5. List-heavy workloads on a large prefix

Frequent ListObjectsV2 over a prefix with millions of objects is expensive and contributes to the request rate, compounding throttling during reads/writes.

aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name ListRequests \
  --dimensions Name=BucketName,Value=data-lake Name=FilterId,Value=EntireBucket \
  --start-time "$(date -u -d '30 min ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --period 300 --statistics Sum \
  --query 'Datapoints[].Sum' --output text

60.0	58.0	4200.0	5100.0

Thousands of LIST calls indicate a job enumerating a huge prefix repeatedly — use an inventory/manifest instead.

6. A transient S3 ServiceUnavailable (infrastructure)

Occasionally the 503 is ServiceUnavailable from a brief internal hiccup, not your rate. It is rare, short-lived, and resolved purely by retry.

aws s3api head-object --bucket data-lake --key ingest/2026-06-23/f8123.json 2>&1

An error occurred (503) when calling the HeadObject operation: Service Unavailable

If request rates are modest and the error vanishes on retry, treat it as transient — robust retries handle it.

Diagnostic Workflow

Step 1: Confirm SlowDown vs. ServiceUnavailable and the prefix

aws s3 cp <SOURCE> s3://<BUCKET>/<PREFIX>/ --recursive 2>&1 | grep -oE '(SlowDown|Service Unavailable)'

SlowDown: Please reduce your request rate is a rate problem; Service Unavailable may be transient. Note the prefix in the failing keys.

Step 2: Correlate request rate with 5xx in CloudWatch

aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name 5xxErrors \
  --dimensions Name=BucketName,Value=<BUCKET> Name=FilterId,Value=EntireBucket \
  --start-time "$(date -u -d '30 min ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --period 300 --statistics Sum \
  --query 'Datapoints[].Sum' --output text

(Requires S3 request metrics enabled.) 5xx climbing with AllRequests confirms rate-driven throttling.

Step 3: Check object distribution under the hot prefix

aws s3api list-objects-v2 --bucket <BUCKET> --prefix <HOT_PREFIX>/ \
  --query 'length(Contents)' --output text

A huge count under one sequential prefix signals a hot-partition design problem.

Step 4: Inspect client retry and concurrency settings

aws configure get retry_mode; aws configure get max_attempts
aws configure get s3.max_concurrent_requests

legacy retries and high concurrency on one prefix are the controllable contributors.

Step 5: Apply backoff / spread, then re-run

export AWS_RETRY_MODE=adaptive AWS_MAX_ATTEMPTS=8
aws configure set s3.max_concurrent_requests 10
aws s3 cp <SOURCE> s3://<BUCKET>/<PREFIX>/ --recursive

Adaptive retries plus reduced concurrency (and, longer term, more prefixes) clear the throttling.

Example Root Cause Analysis

A nightly ingest job writing sensor data to s3://data-lake/ingest/<date>/ began failing with SlowDown as data volume grew. All writes for a day landed under one date prefix.

CloudWatch showed 5xx tracking request rate, and the prefix held a huge object count:

aws s3api list-objects-v2 --bucket data-lake --prefix ingest/2026-06-23/ \
  --query 'length(Contents)' --output text

Over 600k objects written into one sequential date prefix with 40-way concurrency — the partition could not scale fast enough for the burst. Retry mode was also legacy:

aws configure get retry_mode

legacy

Fix (two parts): immediately, enable adaptive retries and cut concurrency so the job completes:

export AWS_RETRY_MODE=adaptive AWS_MAX_ATTEMPTS=8
aws configure set s3.max_concurrent_requests 12

And durably, change the key scheme to inject a high-entropy segment so writes spread across partitions:

ingest/2026-06-23/<2-char-hash>/<sensor-id>.json

After the key change, the same volume wrote without 503s because S3 split the load across many prefixes.

Prevention Best Practices

Design keys to spread load: avoid pure sequential/timestamp prefixes for high-write workloads; inject a high-entropy segment (hash) so S3 can partition across prefixes.
Always retry 503s with exponential backoff and jitter — use the SDK’s adaptive retry mode (AWS_RETRY_MODE=adaptive) rather than failing on the first throttle.
Tune concurrency to the prefix, not the machine; more parallel requests against one prefix trip the limit sooner, they do not raise it.
Replace repeated ListObjectsV2 over huge prefixes with S3 Inventory or a stored manifest to cut request volume.
Enable S3 request metrics so you can correlate 5xxErrors with AllRequests and see which prefix is hot.
For correlating a 503 spike with request rate and prefix from the metrics, the free incident assistant can identify the hot prefix and the retry gap. More S3 walkthroughs are in the AWS guides.

Quick Command Reference

# Confirm SlowDown vs. ServiceUnavailable
aws s3 cp <SOURCE> s3://<BUCKET>/<PREFIX>/ --recursive 2>&1 | grep -oE '(SlowDown|Service Unavailable)'

# Correlate 5xx with request rate (needs request metrics)
aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name 5xxErrors \
  --dimensions Name=BucketName,Value=<BUCKET> Name=FilterId,Value=EntireBucket \
  --start-time "$(date -u -d '30 min ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --period 300 --statistics Sum --query 'Datapoints[].Sum' --output text

# Object count under the hot prefix
aws s3api list-objects-v2 --bucket <BUCKET> --prefix <HOT_PREFIX>/ --query 'length(Contents)' --output text

# Client retry and concurrency settings
aws configure get retry_mode; aws configure get max_attempts
aws configure get s3.max_concurrent_requests

# Re-run with backoff and lower concurrency
AWS_RETRY_MODE=adaptive AWS_MAX_ATTEMPTS=8 aws s3 cp <SOURCE> s3://<BUCKET>/<PREFIX>/ --recursive

Conclusion

503 SlowDown / ServiceUnavailable means your request rate against a prefix is outpacing S3’s per-partition scaling. The usual root causes:

A burst against a single prefix faster than it can scale.
A hot-key / sequential-prefix design concentrating load on one partition.
Missing or weak client retries (503 is retryable).
Excessive concurrency saturating one prefix.
List-heavy workloads inflating the request rate.
A genuinely transient ServiceUnavailable resolved by retry.

Confirm it is rate-driven, add adaptive backoff, reduce concurrency, and spread keys across prefixes — durable fixes come from key design, not just retrying harder.

AWS Error Guide: '503 SlowDown' and ServiceUnavailable S3 Request-Rate Failures