Testing¶
How the Punix test suite is laid out, when to write which kind of test, and the dogfood loop for recipes.
The three pyramid layers¶
tests/ is split by cost. conftest.py auto-applies the marker from the directory, so put each test in the right place and the markers take care of themselves.
| Layer | Path | Cost | What goes here |
|---|---|---|---|
| Unit | tests/a_unit/<package>/ |
fast, isolated, no I/O | Pure logic — parser, type checker, translator pattern-matching, canonical-hash computation. ~1 ms per test. |
| Integration | tests/b_integration/ |
file I/O, no subprocesses | Component interactions through real bytes — Manifest.deploy(tmp_path), recipe-class script generation, sandbox setup. |
| End-to-end | tests/c_e2e/ |
sandboxed subprocess, full CLI |
A complete user workflow — punix install foo produces a store path; conformance properties. |
Under tests/a_unit/ the unit tests mirror src/punix/: a package's tests live in tests/a_unit/<package>/ (deploy/, realise/, frontend/, ir/, help/, migrate/), while tests for a top-level module (cli.py, bootstrap.py, upgrade.py) stay flat at tests/a_unit/. The marker keys on the a_unit ancestor, so nested tests still get unit automatically. Integration and e2e are grouped flat by cost.
Run a subset:
uv run pytest -m unit # by pyramid marker
uv run pytest -m integration
uv run pytest -m e2e
uv run pytest tests/a_unit/ # by tier
uv run pytest tests/a_unit/deploy/ # one package's unit tests
uv run pytest tests/a_unit/migrate/test_migrate_brew.py::test_specific_case
Conformance: the executable spec¶
tests/c_e2e/test_conformance_stage*.py is the executable spec. Every claim made elsewhere on the docs site ("rollback is constant-time", "permuted source yields byte-identical builds", "secret values never reach the store") has a dedicated test that runs end-to-end and is a CI release-blocker.
41 conformance tests across eight files, plus 95 migration tests under tests/a_unit/migrate/test_migrate_brew.py driving the brew→PCL translator against every install-block shape. See Conformance for the per-stage breakdown.
Daily loop¶
make test # uv run pytest (fast — runs the whole suite in ~10 s)
make lint # ruff + ty + pyrefly + mypy
make check # alias for lint
make format # ruff format + ruff check --fix
make test-cov # pytest with HTML + term coverage report
The matrix runs in CI on Python 3.13 and 3.14:
nox -s tests # run the test suite across the matrix locally
nox -s check # lint/format/typecheck on Python 3.12 (the gate)
Where to put your test¶
The decision tree:
- Are you proving an architectural property that affects multiple components (atomic deploy, secret hash-exclusion, content-addressed determinism)? → New entry in
tests/c_e2e/test_conformance_stage*.py. This is a CI release-blocker. - Are you exercising a CLI subcommand end-to-end (a full
punix installinvocation, a deploy + rollback flow)? →tests/c_e2e/test_<command>*.py. - Are you testing a component's interaction with another (manifest serialisation through real
LocalTransport, a recipe class's generated script against a real shell)? →tests/b_integration/. - Are you proving a pure-function property (parser accepts X; canonicalisation makes Y equivalent to Z; translator's regex catches the cargo-with-features shape)? →
tests/a_unit/<package>/(the package under test; a top-level module's tests stay flat attests/a_unit/).
Default to unit. Promote to integration only when real I/O matters. Promote to e2e only when subprocess + multi-step workflow matters.
Recipe-side testing: the dogfood loop¶
The make dogfood target builds every recipe in packages/official/ + packages/seeds/ from source. A red dogfood blocks the release.
make dogfood-list # show what's in the dogfood corpus
make dogfood-dry # --dry-run: type-check + build closure pre-pass, no compile
make dogfood # actually build everything (cache-hits if nothing changed)
make dogfood-translate # regenerate packages/experimental/from-homebrew/ from brew
make dogfood-check # translate + check, no compile (fast feedback for translator changes)
make dogfood-smoke # build one recipe per build-system family
Workflow for adding a recipe:
# 1. Translate or hand-write packages/official/foo.pcl
# 2. Build just it (with the full corpus available as deps)
d=$(mktemp -d -t punix-XXXXXX)
cp packages/official/*.pcl packages/seeds/*.pcl "$d/"
uv run punix build "$d" --only foo
# 3. Confirm no regression elsewhere
make dogfood
If your recipe changes a shared std.* class's behaviour, the cache invalidates for every recipe using that class — expect a multi-minute rebuild on the first make dogfood.
Translator changes¶
When editing src/punix/migrate/brew.py:
# Unit tests for translator patterns (95 cases)
uv run pytest tests/a_unit/migrate/test_migrate_brew.py
# Fast feedback against the real homebrew snapshot
make dogfood-check
# Full validation against the dogfood corpus (~5 min cold)
make dogfood-smoke
Adding a new translator pattern? Write the unit test first — pick a representative brew formula, paste a minimal version into the test, assert the translated output. Examples at the bottom of test_migrate_brew.py.
Hermeticity contract¶
Tests never depend on the host's package manager state, network access, or installed binaries beyond the dev-env prerequisites (git, curl, make, clang).
LocalTransport(tmp_path)is a real implementation of theTransportProtocol writing to a sandbox dir. Use it instead of mocking.tmp_pathis a pytest fixture; pass it as the root for any deploy-side fixture.- The realise sandbox writes to a per-build temp dir. No
~/.punix/store/mutation during unit/integration tests. - The conformance suite uses real
subprocesscalls forgit,cargo,gcc, etc. — these are the only "external dependency" tests. They run under the e2e marker.
Property tests with stubs over mocks¶
Prefer stubs over mocks. Verify state, not behaviour.
# Good: state assertion against a real Transport.
def test_deploy_writes_unit_file(tmp_path):
transport = LocalTransport(tmp_path)
deploy(stack, transport=transport)
assert (tmp_path / "etc/systemd/system/api.service").exists()
# Avoid: mock assertion over internal call patterns.
def test_deploy_calls_write_method():
transport = MagicMock()
deploy(stack, transport=transport)
transport.write.assert_called_with("/etc/systemd/system/api.service", ...)
The first survives refactors that change the internal call sequence. The second breaks the moment you move the write call into a helper.
Type-checker¶
Three type checkers run in parallel (the make check gate):
uv run ty # primary fast checker
uv run pyrefly # secondary; catches different edge cases
uv run mypy src # legacy gate; strictest on src/
Disagreements between them are real signals — when one passes and another fails, the one that fails usually found something subtle. Don't silence; investigate.
Coverage¶
make test-cov # HTML report at htmlcov/index.html
make coverage # top-500 brew-coverage report (separate metric — what % of brew's analytics-top-500 fully resolves through the corpus)
Two separate "coverage" numbers, both useful:
- Code coverage — % of
src/punix/lines hit by the test suite. Standard. - Corpus coverage — % of Homebrew's top-500 install-on-request set whose dep closure is resolvable through
packages/official/. See Corpus status.
Related¶
- Conformance — the property list and the per-stage test breakdown.
- Contributing — environment setup + daily workflow.
- Adding a package — the dogfood loop in detail.