Resilient HTTP in Python with requests and httpx Retry Sessions
A bare requests.get against a flaky API will eventually page you. Connection pooling, timeouts, and retry transports turn fragile scripts into reliable ones.
- #bash
- #python
- #httpx
- #requests
- #http
- #reliability
The single most common bug in ops scripts that talk to APIs is also the most invisible: a bare requests.get(url). It works in testing, it works most of the time in production, and then the upstream has a five-second blip or returns a 503 and your script either hangs forever or crashes — and because it was a cron job, you find out when something downstream is already broken.
Three habits fix nearly all of it: always set a timeout, reuse a connection-pooled session, and retry transient failures with backoff. Both requests and the newer httpx support all three. Here’s how to wire them up so your scripts stop being fragile.
Habit 1: always set a timeout
requests and httpx will wait forever by default if the server stops responding. That’s the cause of the mysterious hung cron job. Every single request needs a timeout.
import httpx
# httpx lets you split connect vs read timeouts — usually what you want
timeout = httpx.Timeout(connect=5.0, read=30.0, write=10.0, pool=5.0)
resp = httpx.get("https://api.example.com/status", timeout=timeout)
A short connect timeout fails fast when the host is unreachable; a longer read timeout tolerates a slow-but-working endpoint. The one value you must never accept is the default of “wait indefinitely.” If you take nothing else from this article, put a timeout on every request you make.
Habit 2: reuse a session for connection pooling
Calling the module-level httpx.get() or requests.get() opens a fresh TCP (and TLS) connection every time. For one call that’s fine. In a loop over 500 endpoints it’s wasteful and slow — you pay the handshake cost repeatedly. A Client/Session object pools and reuses connections:
import httpx
with httpx.Client(
base_url="https://api.example.com",
timeout=httpx.Timeout(10.0),
headers={"Authorization": f"Bearer {token}"},
limits=httpx.Limits(max_connections=20, max_keepalive_connections=10),
) as client:
for path in paths:
r = client.get(path) # reuses pooled connections, shares headers
r.raise_for_status()
Beyond speed, the session lets you set headers, base URL, and auth once instead of repeating them on every call. The limits keep you from opening hundreds of simultaneous connections and overwhelming the upstream — a politeness that also keeps you from getting rate-limited or blocked.
Habit 3: retry transient failures with backoff
Networks blip. Servers return 502/503/429 under load. Most of these are transient — the request would succeed if you waited a moment and tried again. But you want to retry only the safe, transient cases, with exponential backoff, and give up after a few attempts.
In httpx, the clean way is a retrying transport:
import httpx
transport = httpx.HTTPTransport(retries=3) # retries connection errors
client = httpx.Client(transport=transport, timeout=httpx.Timeout(10.0))
httpx’s built-in transport retries handle connection errors but not HTTP status codes. For status-based retries (the 503s and 429s), I use the tenacity library, which makes the policy explicit:
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception
RETRYABLE_STATUS = {429, 500, 502, 503, 504}
def is_retryable(exc):
if isinstance(exc, (httpx.ConnectError, httpx.ReadTimeout)):
return True
if isinstance(exc, httpx.HTTPStatusError):
return exc.response.status_code in RETRYABLE_STATUS
return False
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=1, max=30),
retry=retry_if_exception(is_retryable),
reraise=True,
)
def fetch(client, path):
r = client.get(path)
r.raise_for_status() # raises HTTPStatusError on 4xx/5xx
return r.json()
The important judgment call lives in is_retryable. You retry timeouts, connection errors, and the transient 5xx/429 codes. You do not retry a 400 or 404 — those are bugs in your request, and retrying just wastes time and hammers the server with the same broken call five times. Exponential backoff (wait_exponential) spaces the attempts out so you’re not slamming a struggling service.
Don’t retry non-idempotent writes blindly
A subtle trap: retrying a POST that creates a resource can create it twice if the first request actually succeeded but the response got lost. Retries are safe for idempotent operations (GET, PUT, DELETE) but dangerous for naive POSTs. For writes, either:
- only retry on connection errors that happened before the request was sent, or
- use an idempotency key (many APIs support an
Idempotency-Keyheader) so a duplicate request is deduplicated server-side.
When in doubt, retry reads freely and writes carefully.
The requests equivalent
If you’re on requests rather than httpx, the same three habits apply via a Session with an HTTPAdapter and urllib3’s Retry:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
retry = Retry(
total=5,
backoff_factor=1, # exponential: 1s, 2s, 4s...
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods={"GET", "PUT", "DELETE"}, # not POST, by default
)
session = requests.Session()
adapter = HTTPAdapter(max_retries=retry, pool_maxsize=20)
session.mount("https://", adapter)
session.mount("http://", adapter)
resp = session.get("https://api.example.com/status", timeout=(5, 30))
Notice allowed_methods deliberately excludes POST — the same non-idempotent-write caution, baked into the retry policy. And timeout=(5, 30) is the connect/read split.
Putting it together
A resilient HTTP client is just those three habits combined: a pooled session, a timeout on every request, and a retry policy that backs off and only retries transient, safe failures. Wrap that once at the top of your script and every call you make inherits it. The flaky-API-that-pages-you problem mostly evaporates — not because the API got more reliable, but because your script finally stopped assuming it was.
For more reliability patterns in automation, see the Bash & Python automation guides or start from a prompt.
Retry and timeout values are starting points. Tune them to the API’s documented SLAs and rate limits, and never retry non-idempotent writes without an idempotency key.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.