Automating GitHub with Python and the REST API

A surprising amount of platform-team toil is GitHub bookkeeping: auditing which of 80 repos have branch protection, closing stale issues, syncing labels across an org, opening the same dependency-bump PR everywhere. Doing it by hand is mind-numbing and error-prone. Doing it with the GitHub API is a short Python script and a coffee.

I’ve automated a lot of this, and the patterns that matter are the same ones that bite people: pagination, rate limits, and token handling. Get those three right and GitHub automation becomes reliable instead of a thing that works in testing and falls over against the real org.

Authenticating without leaking the token

First rule: the token never appears in the source. Pull it from the environment, and prefer a fine-grained personal access token or a GitHub App installation token scoped to exactly what the script needs.

import os
import httpx

TOKEN = os.environ["GITHUB_TOKEN"]            # set in env / secret manager, never hardcoded

client = httpx.Client(
    base_url="https://api.github.com",
    headers={
        "Authorization": f"Bearer {TOKEN}",
        "Accept": "application/vnd.github+json",
        "X-GitHub-Api-Version": "2022-11-28",
    },
    timeout=30.0,
)

Setting the X-GitHub-Api-Version header pins you to a known API version so a future GitHub change doesn’t silently alter your script’s behavior. The Accept header asks for the current JSON media type. Both are cheap insurance.

You can use the PyGithub library instead of raw HTTP, and it’s pleasant for simple cases. I tend to use the REST API directly when I want full control over pagination and rate-limit handling, which is most ops scripts.

Pagination, the bug that hides in plain sight

Just like with cloud APIs, GitHub paginates list endpoints — 30 items per page by default, 100 max. Call /repos once and you’ll process the first 30 of 200 repos and never know the rest existed. The page links live in the Link response header.

def paginate(client, url, params=None):
    params = dict(params or {})
    params["per_page"] = 100                   # fewer round-trips
    while url:
        resp = client.get(url, params=params)
        resp.raise_for_status()
        yield from resp.json()
        # GitHub puts the next page URL in the Link header
        url = resp.links.get("next", {}).get("url")
        params = None                          # next URL already has the cursor

# List every repo in an org, all pages
repos = list(paginate(client, "/orgs/myorg/repos"))
print(f"{len(repos)} repos")

httpx parses the Link header into resp.links for you, so following next is one lookup. Once the next link is gone, you’ve seen every page. Build every list operation on a helper like this and the “only saw the first page” bug disappears for good.

Respecting rate limits

Authenticated REST requests get 5,000/hour. A naive org-wide loop can burn through that, and when you hit zero you get 403s. The right behavior is to read the rate-limit headers and wait when you’re about to run dry, rather than hammering until you’re blocked.

import time

def get_with_ratelimit(client, url, **kwargs):
    while True:
        resp = client.get(url, **kwargs)
        remaining = int(resp.headers.get("X-RateLimit-Remaining", "1"))
        if resp.status_code == 403 and remaining == 0:
            reset = int(resp.headers["X-RateLimit-Reset"])
            wait = max(reset - int(time.time()), 1)
            print(f"rate limited, sleeping {wait}s")
            time.sleep(wait)
            continue
        resp.raise_for_status()
        return resp

The X-RateLimit-Reset header is a unix timestamp telling you exactly when your budget refills — so you sleep precisely as long as needed, not a guessed-at interval. For heavy read workloads, also consider GitHub’s GraphQL API, which often fetches in one query what REST needs many calls for, conserving your budget.

A real task: audit branch protection across an org

Putting it together — find every repo missing branch protection on its default branch:

def unprotected_repos(client, org):
    for repo in paginate(client, f"/orgs/{org}/repos", {"type": "all"}):
        if repo["archived"]:
            continue
        name = repo["name"]
        branch = repo["default_branch"]
        resp = client.get(f"/repos/{org}/{name}/branches/{branch}/protection")
        if resp.status_code == 404:            # 404 == no protection configured
            yield name
        elif resp.status_code == 403:
            print(f"  no admin access to {name}, skipping")
        else:
            resp.raise_for_status()

for name in unprotected_repos(client, "myorg"):
    print(f"UNPROTECTED: {name}")

Note the explicit 404 handling: for the protection endpoint, a 404 is the answer (“not protected”), not an error. Reading the status code instead of catching a generic exception lets you tell “not protected” apart from “you don’t have permission to check.” That distinction is the difference between a useful audit and a misleading one.

Making changes safely

For write operations — closing issues, creating PRs, updating labels — apply the same discipline I’d use anywhere destructive:

def close_stale_issues(client, org, repo, dry_run=True):
    cutoff = "2026-01-01T00:00:00Z"
    for issue in paginate(client, f"/repos/{org}/{repo}/issues",
                          {"state": "open", "labels": "stale"}):
        if "pull_request" in issue:            # the issues endpoint also returns PRs
            continue
        number = issue["number"]
        if dry_run:
            print(f"WOULD close #{number}: {issue['title']}")
            continue
        client.patch(f"/repos/{org}/{repo}/issues/{number}",
                     json={"state": "closed"})

dry_run=True by default. The script prints exactly what it would close, you eyeball the list, and only then re-run with dry_run=False. The skip for "pull_request" in issue matters too — GitHub’s issues endpoint returns PRs as well, and you rarely want to bulk-close those by accident.

The habits worth keeping

Token from env, scoped tight, never logged. Fine-grained tokens or App installation tokens beat classic PATs with broad scopes.
Paginate every list call. It’s the most common silent bug.
Read rate-limit headers and back off instead of retrying blindly into a 403 wall.
Default writes to dry-run and print the plan before executing.
Match on status codes, not exception strings — 404 often means “absent,” which may be exactly what you’re checking for.

GitHub automation is some of the highest-leverage scripting a platform team can do: a 60-line script replaces an afternoon of clicking, and it runs the same way every time. Build it on paginated, rate-aware, dry-run-by-default foundations and it’ll keep working as your org grows.

For more API and automation patterns, see the Bash & Python automation guides or start from a prompt.

API behavior, scopes, and rate limits change. Verify endpoints against current GitHub docs and test write operations against a throwaway repo before running them org-wide.