Testing Your Scripts with Bats and pytest Before They Hit

The scripts that run our infrastructure are code, but they are the code least likely to have tests. We test the application and ship the deploy script that touches production on a wing and a prayer. Then it fails on an edge case at the worst possible moment.

In 25 years I have learned that automation scripts deserve tests precisely because they run in dangerous places. Here is how I test bash with bats and Python with pytest, including how to test the risky parts without actually running them.

Testing bash with bats

bats (Bash Automated Testing System) lets you write real assertions against shell scripts. A test file is just functions named with @test:

#!/usr/bin/env bats

setup() {
  source ./lib/utils.sh
}

@test "slugify lowercases and replaces spaces" {
  result="$(slugify 'Hello World')"
  [ "$result" = "hello-world" ]
}

@test "deploy fails without an environment argument" {
  run ./deploy.sh
  [ "$status" -eq 1 ]
  [[ "$output" =~ "usage" ]]
}

The run helper is the heart of bats: it runs a command and captures $status (exit code) and $output (combined stdout/stderr) without letting a failure abort the test. So you can assert that your script correctly fails — that a missing argument exits 1 with a usage message. That negative-path testing is exactly what bash scripts usually lack.

Make functions testable

The trick to testable bash is to put logic in functions in a sourceable file, separate from the script that calls them. A script that is one long top-to-bottom blob can only be tested end-to-end. Pull slugify, validate_env, and friends into lib/utils.sh, source it in both the real script and the test, and you can unit-test each function.

Testing Python with pytest

pytest needs almost no ceremony. A function named test_* with a bare assert is a complete test:

# test_parsing.py
from mytool.parse import parse_duration

def test_parse_duration_seconds():
    assert parse_duration("30s") == 30

def test_parse_duration_minutes():
    assert parse_duration("5m") == 300

def test_parse_duration_rejects_garbage():
    import pytest
    with pytest.raises(ValueError):
        parse_duration("banana")

Run pytest and it discovers and runs everything. The pytest.raises context manager tests the error path — that bad input raises cleanly rather than returning something weird. Test the failure cases as deliberately as the happy path; that is where the real bugs hide.

The hard part: testing risky commands

Here is the central problem. Your deploy script runs kubectl apply or rm -rf or aws s3 sync. You cannot run those for real in a test. The answer is to mock them so the test verifies what would have happened without it happening.

Mocking in pytest

Python’s unittest.mock replaces the dangerous call with a fake that records how it was called:

from unittest.mock import patch

def test_deploy_calls_apply_with_prod_manifest():
    with patch("mytool.deploy.run_command") as mock_run:
        deploy(env="production")
        mock_run.assert_called_once_with(
            ["kubectl", "apply", "-f", "manifests/production.yaml"]
        )

The real kubectl never runs. You assert that if it had, it would have applied the production manifest. This lets you test deploy logic safely — exactly the logic you most want covered.

Stubbing commands in bats

For bash, put a fake command earlier on the PATH than the real one:

@test "deploy invokes kubectl apply" {
  mkdir -p "$BATS_TMPDIR/bin"
  cat > "$BATS_TMPDIR/bin/kubectl" <<'EOF'
#!/usr/bin/env bash
echo "kubectl $*" >> "$BATS_TMPDIR/calls.log"
EOF
  chmod +x "$BATS_TMPDIR/bin/kubectl"
  PATH="$BATS_TMPDIR/bin:$PATH" run ./deploy.sh production
  grep -q "kubectl apply" "$BATS_TMPDIR/calls.log"
}

The fake kubectl just logs its arguments. Your script thinks it ran kubectl; nothing touched a cluster. This is the bash equivalent of mocking, and it is how you test destructive scripts safely.

Test the edge cases that actually break

When deciding what to test, target the inputs that cause real incidents:

Empty or missing arguments.
Paths with spaces or special characters.
A command that returns non-zero partway through.
An empty file or empty API response.
The second run of a supposedly idempotent script.

Each of those is a production outage in waiting. A handful of tests covering them is worth more than a hundred tests of the happy path.

Where AI fits

AI is genuinely strong at test generation because tests are pattern-following. I paste a function and ask:

“Write pytest tests for this function. Cover the happy path, empty input, invalid input that should raise, and any boundary conditions. Use pytest.raises for the error cases. Don’t test the implementation details, just the behavior.”

The model is good at enumerating edge cases I would skip out of laziness — the empty string, the off-by-one boundary, the malformed input. I review each test to make sure it asserts real behavior and not just current output. I keep these test-generation prompts in my prompt library.

Make it part of the workflow

Wire bats and pytest into pre-commit or CI so scripts cannot merge untested. The bar does not need to be high — even a few tests covering the failure paths catch the bugs that page you. The goal is not 100% coverage; it is that the deploy script which touches production has been run at least once somewhere other than production.

For more on building reliable automation, see our bash and Python automation guides.

Testing Your Scripts with Bats and pytest Before They Hit Production