Skip to content

Testing

How the Punix test suite is laid out, when to write which kind of test, and the dogfood loop for recipes.

The three pyramid layers

tests/ is split by cost. conftest.py auto-applies the marker from the directory, so put each test in the right place and the markers take care of themselves.

Layer Path Cost What goes here
Unit tests/a_unit/<package>/ fast, isolated, no I/O Pure logic — parser, type checker, translator pattern-matching, canonical-hash computation. ~1 ms per test.
Integration tests/b_integration/ file I/O, no subprocesses Component interactions through real bytes — Manifest.deploy(tmp_path), recipe-class script generation, sandbox setup.
End-to-end tests/c_e2e/ sandboxed subprocess, full CLI A complete user workflow — punix install foo produces a store path; conformance properties.

Under tests/a_unit/ the unit tests mirror src/punix/: a package's tests live in tests/a_unit/<package>/ (deploy/, realise/, frontend/, ir/, help/, migrate/), while tests for a top-level module (cli.py, bootstrap.py, upgrade.py) stay flat at tests/a_unit/. The marker keys on the a_unit ancestor, so nested tests still get unit automatically. Integration and e2e are grouped flat by cost.

Run a subset:

uv run pytest -m unit            # by pyramid marker
uv run pytest -m integration
uv run pytest -m e2e
uv run pytest tests/a_unit/         # by tier
uv run pytest tests/a_unit/deploy/  # one package's unit tests
uv run pytest tests/a_unit/migrate/test_migrate_brew.py::test_specific_case

Conformance: the executable spec

tests/c_e2e/test_conformance_stage*.py is the executable spec. Every claim made elsewhere on the docs site ("rollback is constant-time", "permuted source yields byte-identical builds", "secret values never reach the store") has a dedicated test that runs end-to-end and is a CI release-blocker.

uv run pytest tests/c_e2e/test_conformance_stage*.py -v

41 conformance tests across eight files, plus 95 migration tests under tests/a_unit/migrate/test_migrate_brew.py driving the brew→PCL translator against every install-block shape. See Conformance for the per-stage breakdown.

Daily loop

make test         # uv run pytest (fast — runs the whole suite in ~10 s)
make lint         # ruff + ty + pyrefly + mypy
make check        # alias for lint
make format       # ruff format + ruff check --fix
make test-cov     # pytest with HTML + term coverage report

The matrix runs in CI on Python 3.13 and 3.14:

nox -s tests      # run the test suite across the matrix locally
nox -s check      # lint/format/typecheck on Python 3.12 (the gate)

Where to put your test

The decision tree:

  1. Are you proving an architectural property that affects multiple components (atomic deploy, secret hash-exclusion, content-addressed determinism)? → New entry in tests/c_e2e/test_conformance_stage*.py. This is a CI release-blocker.
  2. Are you exercising a CLI subcommand end-to-end (a full punix install invocation, a deploy + rollback flow)? → tests/c_e2e/test_<command>*.py.
  3. Are you testing a component's interaction with another (manifest serialisation through real LocalTransport, a recipe class's generated script against a real shell)? → tests/b_integration/.
  4. Are you proving a pure-function property (parser accepts X; canonicalisation makes Y equivalent to Z; translator's regex catches the cargo-with-features shape)? → tests/a_unit/<package>/ (the package under test; a top-level module's tests stay flat at tests/a_unit/).

Default to unit. Promote to integration only when real I/O matters. Promote to e2e only when subprocess + multi-step workflow matters.

Recipe-side testing: the dogfood loop

The make dogfood target builds every recipe in packages/official/ + packages/seeds/ from source. A red dogfood blocks the release.

make dogfood-list     # show what's in the dogfood corpus
make dogfood-dry      # --dry-run: type-check + build closure pre-pass, no compile
make dogfood          # actually build everything (cache-hits if nothing changed)
make dogfood-translate # regenerate packages/experimental/from-homebrew/ from brew
make dogfood-check    # translate + check, no compile (fast feedback for translator changes)
make dogfood-smoke    # build one recipe per build-system family

Workflow for adding a recipe:

# 1. Translate or hand-write packages/official/foo.pcl

# 2. Build just it (with the full corpus available as deps)
d=$(mktemp -d -t punix-XXXXXX)
cp packages/official/*.pcl packages/seeds/*.pcl "$d/"
uv run punix build "$d" --only foo

# 3. Confirm no regression elsewhere
make dogfood

If your recipe changes a shared std.* class's behaviour, the cache invalidates for every recipe using that class — expect a multi-minute rebuild on the first make dogfood.

Translator changes

When editing src/punix/migrate/brew.py:

# Unit tests for translator patterns (95 cases)
uv run pytest tests/a_unit/migrate/test_migrate_brew.py

# Fast feedback against the real homebrew snapshot
make dogfood-check

# Full validation against the dogfood corpus (~5 min cold)
make dogfood-smoke

Adding a new translator pattern? Write the unit test first — pick a representative brew formula, paste a minimal version into the test, assert the translated output. Examples at the bottom of test_migrate_brew.py.

Hermeticity contract

Tests never depend on the host's package manager state, network access, or installed binaries beyond the dev-env prerequisites (git, curl, make, clang).

  • LocalTransport(tmp_path) is a real implementation of the Transport Protocol writing to a sandbox dir. Use it instead of mocking. tmp_path is a pytest fixture; pass it as the root for any deploy-side fixture.
  • The realise sandbox writes to a per-build temp dir. No ~/.punix/store/ mutation during unit/integration tests.
  • The conformance suite uses real subprocess calls for git, cargo, gcc, etc. — these are the only "external dependency" tests. They run under the e2e marker.

Property tests with stubs over mocks

Prefer stubs over mocks. Verify state, not behaviour.

# Good: state assertion against a real Transport.
def test_deploy_writes_unit_file(tmp_path):
    transport = LocalTransport(tmp_path)
    deploy(stack, transport=transport)
    assert (tmp_path / "etc/systemd/system/api.service").exists()

# Avoid: mock assertion over internal call patterns.
def test_deploy_calls_write_method():
    transport = MagicMock()
    deploy(stack, transport=transport)
    transport.write.assert_called_with("/etc/systemd/system/api.service", ...)

The first survives refactors that change the internal call sequence. The second breaks the moment you move the write call into a helper.

Type-checker

Three type checkers run in parallel (the make check gate):

uv run ty           # primary fast checker
uv run pyrefly      # secondary; catches different edge cases
uv run mypy src     # legacy gate; strictest on src/

Disagreements between them are real signals — when one passes and another fails, the one that fails usually found something subtle. Don't silence; investigate.

Coverage

make test-cov       # HTML report at htmlcov/index.html
make coverage       # top-500 brew-coverage report (separate metric — what % of brew's analytics-top-500 fully resolves through the corpus)

Two separate "coverage" numbers, both useful:

  • Code coverage — % of src/punix/ lines hit by the test suite. Standard.
  • Corpus coverage — % of Homebrew's top-500 install-on-request set whose dep closure is resolvable through packages/official/. See Corpus status.