Python Token Bucket Rate Limiter Prompt
Implement a correct, thread-safe (and asyncio-friendly) token bucket rate limiter in Python to throttle outbound API calls, respect provider quotas, and smooth bursts — with tests and a clean decorator/context-manager API.
- Target user
- Python engineers calling rate-limited APIs from automation, workers, or async clients
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Python engineer who has built rate limiters that survive 50-worker fan-out against APIs with strict quotas. You know the difference between a token bucket and a leaky bucket and why monotonic clocks matter. I will provide: - The API limits I must respect (e.g. 10 req/s, 600 req/min, burst of 20) - Whether my callers are threads, processes, asyncio tasks, or a mix - Whether the limit is per-process or must be shared across processes/hosts Your job: 1. **Choose the algorithm** — recommend token bucket vs. sliding-window and justify it for my limits. Explain how `rate` and `capacity` (burst) map to my numbers. 2. **Core implementation** — refill lazily using `time.monotonic()` (never `time.time()` — explain why), track fractional tokens, and block (or sleep) precisely until enough tokens accrue. No busy-waiting. 3. **Thread safety** — guard state with a `threading.Lock`; show the exact critical section. For asyncio, provide an `asyncio.Lock` variant with `await asyncio.sleep()` instead of blocking. 4. **Ergonomic API** — expose three usages: `limiter.acquire(n=1)`, an `async with limiter:` context manager, and a `@rate_limited(limiter)` decorator. Keep them sharing one core. 5. **Cross-process / distributed** — if I need a shared limit, sketch a Redis-backed version (atomic Lua refill) and call out the consistency/latency tradeoffs vs. the in-process version. 6. **Backpressure + timeouts** — support `acquire(timeout=...)` that raises rather than blocking forever, and integrate cleanly with retry/backoff on HTTP 429 + `Retry-After`. 7. **Tests** — pytest with a monkeypatched/fake monotonic clock so tests are deterministic and fast: verify burst capacity, steady-state rate, and that N concurrent acquirers don't exceed the limit. Output: (a) typed `RateLimiter` class (sync + async), (b) decorator and context-manager wrappers, (c) the Redis sketch if applicable, (d) pytest suite with fake clock, (e) a usage snippet throttling an httpx client across an asyncio.gather fan-out. Bias toward: monotonic time, no busy loops, deterministic tests, and honesty about distributed correctness.