Automating GitHub with Python and the REST API
From auto-labeling PRs to bulk repo audits, GitHub's API turns tedious org-wide chores into a script. Here's how to do it without getting rate-limited or leaking tokens.
- #bash
- #python
- #github
- #api
- #automation
- #ci
A surprising amount of platform-team toil is GitHub bookkeeping: auditing which of 80 repos have branch protection, closing stale issues, syncing labels across an org, opening the same dependency-bump PR everywhere. Doing it by hand is mind-numbing and error-prone. Doing it with the GitHub API is a short Python script and a coffee.
I’ve automated a lot of this, and the patterns that matter are the same ones that bite people: pagination, rate limits, and token handling. Get those three right and GitHub automation becomes reliable instead of a thing that works in testing and falls over against the real org.
Authenticating without leaking the token
First rule: the token never appears in the source. Pull it from the environment, and prefer a fine-grained personal access token or a GitHub App installation token scoped to exactly what the script needs.
import os
import httpx
TOKEN = os.environ["GITHUB_TOKEN"] # set in env / secret manager, never hardcoded
client = httpx.Client(
base_url="https://api.github.com",
headers={
"Authorization": f"Bearer {TOKEN}",
"Accept": "application/vnd.github+json",
"X-GitHub-Api-Version": "2022-11-28",
},
timeout=30.0,
)
Setting the X-GitHub-Api-Version header pins you to a known API version so a future GitHub change doesn’t silently alter your script’s behavior. The Accept header asks for the current JSON media type. Both are cheap insurance.
You can use the PyGithub library instead of raw HTTP, and it’s pleasant for simple cases. I tend to use the REST API directly when I want full control over pagination and rate-limit handling, which is most ops scripts.
Pagination, the bug that hides in plain sight
Just like with cloud APIs, GitHub paginates list endpoints — 30 items per page by default, 100 max. Call /repos once and you’ll process the first 30 of 200 repos and never know the rest existed. The page links live in the Link response header.
def paginate(client, url, params=None):
params = dict(params or {})
params["per_page"] = 100 # fewer round-trips
while url:
resp = client.get(url, params=params)
resp.raise_for_status()
yield from resp.json()
# GitHub puts the next page URL in the Link header
url = resp.links.get("next", {}).get("url")
params = None # next URL already has the cursor
# List every repo in an org, all pages
repos = list(paginate(client, "/orgs/myorg/repos"))
print(f"{len(repos)} repos")
httpx parses the Link header into resp.links for you, so following next is one lookup. Once the next link is gone, you’ve seen every page. Build every list operation on a helper like this and the “only saw the first page” bug disappears for good.
Respecting rate limits
Authenticated REST requests get 5,000/hour. A naive org-wide loop can burn through that, and when you hit zero you get 403s. The right behavior is to read the rate-limit headers and wait when you’re about to run dry, rather than hammering until you’re blocked.
import time
def get_with_ratelimit(client, url, **kwargs):
while True:
resp = client.get(url, **kwargs)
remaining = int(resp.headers.get("X-RateLimit-Remaining", "1"))
if resp.status_code == 403 and remaining == 0:
reset = int(resp.headers["X-RateLimit-Reset"])
wait = max(reset - int(time.time()), 1)
print(f"rate limited, sleeping {wait}s")
time.sleep(wait)
continue
resp.raise_for_status()
return resp
The X-RateLimit-Reset header is a unix timestamp telling you exactly when your budget refills — so you sleep precisely as long as needed, not a guessed-at interval. For heavy read workloads, also consider GitHub’s GraphQL API, which often fetches in one query what REST needs many calls for, conserving your budget.
A real task: audit branch protection across an org
Putting it together — find every repo missing branch protection on its default branch:
def unprotected_repos(client, org):
for repo in paginate(client, f"/orgs/{org}/repos", {"type": "all"}):
if repo["archived"]:
continue
name = repo["name"]
branch = repo["default_branch"]
resp = client.get(f"/repos/{org}/{name}/branches/{branch}/protection")
if resp.status_code == 404: # 404 == no protection configured
yield name
elif resp.status_code == 403:
print(f" no admin access to {name}, skipping")
else:
resp.raise_for_status()
for name in unprotected_repos(client, "myorg"):
print(f"UNPROTECTED: {name}")
Note the explicit 404 handling: for the protection endpoint, a 404 is the answer (“not protected”), not an error. Reading the status code instead of catching a generic exception lets you tell “not protected” apart from “you don’t have permission to check.” That distinction is the difference between a useful audit and a misleading one.
Making changes safely
For write operations — closing issues, creating PRs, updating labels — apply the same discipline I’d use anywhere destructive:
def close_stale_issues(client, org, repo, dry_run=True):
cutoff = "2026-01-01T00:00:00Z"
for issue in paginate(client, f"/repos/{org}/{repo}/issues",
{"state": "open", "labels": "stale"}):
if "pull_request" in issue: # the issues endpoint also returns PRs
continue
number = issue["number"]
if dry_run:
print(f"WOULD close #{number}: {issue['title']}")
continue
client.patch(f"/repos/{org}/{repo}/issues/{number}",
json={"state": "closed"})
dry_run=True by default. The script prints exactly what it would close, you eyeball the list, and only then re-run with dry_run=False. The skip for "pull_request" in issue matters too — GitHub’s issues endpoint returns PRs as well, and you rarely want to bulk-close those by accident.
The habits worth keeping
- Token from env, scoped tight, never logged. Fine-grained tokens or App installation tokens beat classic PATs with broad scopes.
- Paginate every list call. It’s the most common silent bug.
- Read rate-limit headers and back off instead of retrying blindly into a 403 wall.
- Default writes to dry-run and print the plan before executing.
- Match on status codes, not exception strings —
404often means “absent,” which may be exactly what you’re checking for.
GitHub automation is some of the highest-leverage scripting a platform team can do: a 60-line script replaces an afternoon of clicking, and it runs the same way every time. Build it on paginated, rate-aware, dry-run-by-default foundations and it’ll keep working as your org grows.
For more API and automation patterns, see the Bash & Python automation guides or start from a prompt.
API behavior, scopes, and rate limits change. Verify endpoints against current GitHub docs and test write operations against a throwaway repo before running them org-wide.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.