Skip to content

Hermetic builds

Punix's whole content-addressing promise — same inputs ⇒ same hash ⇒ same artifact — rests on one assumption: that the build's inputs are fully captured by the hash. A go build that pulls modules off the network at build time breaks that assumption. The resolved dependency closure is an input to the artifact, but it never appears in the canonical derivation, so two builds that fetch different transitive sets collide on a single store path. This page is about closing that hole — making polyglot, network-fetching builds reproducible, offline-capable, and cache-correct, by reusing primitives Punix already has.

The problem: network at build time

The recipe library can already express a real multi-stage, polyglot build — a frontend asset pipeline, then a backend compile, toolchains-as-dependencies, one derivation's store path feeding another's dep. What it could not do was acquire dependencies honestly. Every language builder fetches its deps over the network at build time: go build pulls modules, npm install hits the registry, cargo build hits crates.io. "Hermetic" in the older recipe docstrings meant only "no $HOME writes, no caches outside the build dir" — not "no network".

Three consequences, all real:

  • Not reproducible. A re-tagged or yanked package, or a moved ref, silently changes the output. Nothing pins the resolved closure.
  • Not cache-correct. The canonical derivation hash captures recipe id + arg/hook bodies + recipe-code hash + source hash + system + sorted inputs — but it does not capture network-resolved dep versions. Two builds resolving different transitive sets land on the same store path. This is a correctness hole, not a purity nicety.
  • No isolation. The default build sandbox is SimpleSandbox, which runs the script on the host with full network access (the registered deviation D-D).

The solution: vendor as a fixed-output input

The fix is the Nix-proven vendor-as-fixed-output-input model, assembled below the eval/realise seam entirely from pieces Punix already ships. There is no new hash machinery.

The fixed-output derivation is the single auditable network step

A fixed_output (FOD) derivation is the one place network access is allowed, and it pays for that privilege by pinning its result to a content hash. For each network-fetching language, dependency acquisition becomes a separate derivation whose source.type = "fixed_output":

  1. It runs the resolve-and-download command (go mod vendor, npm ci, cargo vendor, …) with the network on.
  2. Its realised tree is hashed — order-, mtime-, and owner-independent (compute_tree_hash, keeping only the executable bit) — and compared against the author-declared hash (the vendorHash / depsHash).
  3. On any drift, the build fails with [E13] (OutputHashMismatchError), carrying both the expected and the observed hash so the operator can choose between "bump the declared hash" and "the upstream was tampered with".

The build derivation then lists this vendor FOD in its deps and consumes the vendored tree offline (e.g. go build -mod=vendor). This is exactly buildGoModule{vendorHash} on Punix's existing FOD.

The declared hash is a trust anchor, not a derived fact

The guarantee is precise: same declared hash ⇒ same vendored store path ⇒ reproducible build. It pins whatever the network served to a content hash an author reviewed once. Verification runs on build; once content-addressed, a store cache-hit skips re-verification. The authoring loop is the [E13] "wrong-hash → correct-hash" workflow: declare a sentinel hash, realise, read the reported actual hash, paste it back.

This is emphatically not import-from-derivation. IFD would compute the vendor hash during evaluation by running a build — pushing an effect into the pure layer and breaking the seam. Here the hash is an author-declared string literal in the PCL source record. The evaluator reads it (via _jsonable) and never runs a fetch or reads a build output back into eval. The dep hash is a fact the author asserts and the realiser checks, never a fact the evaluator discovers.

Cache-correctness comes for free by composition

No new hash field is added anywhere. The mechanism is pure composition:

  • A fixed_output derivation's own store path is a function of its declared hash (the hash rides in that FOD's canonical_json source field).
  • canonical_json already folds the build's inputs — the sorted dep store paths — into the build's hash.

So: name the vendor FOD as a dep of the build, and the resolved-dep content hash enters the build's canonical hash transitively. Two distinct declared vendor hashes ⇒ two distinct vendor store paths ⇒ two distinct build store paths, by construction. This is pinned by a conformance test (tests/a_unit/realise/test_adr025_vendor_cache_correctness.py).

The build runs offline when the host can enforce it

Network isolation is a property of the sandbox. RealiseLocal._build_sandbox selects one per build:

  • If any of the build's deps is named with the -vendor suffix — the marker that says "this build is offline; its deps are pinned in a vendor FOD" — and a network-off bwrap sandbox can actually run here, the build runs in BubblewrapSandbox(enable_network=False) (bwrap --unshare-all, no --share-net).
  • Otherwise it falls back to the configured sandbox (SimpleSandbox), where the recipe's own GOPROXY=off / -mod=vendor keeps the build offline in effect even without kernel-level isolation.
  • A build with no vendor dep keeps today's behaviour exactly — this is opt-in per recipe, not a global flip that would break the existing network-building corpus at once.

Offline mode is inferred from the -vendor dep marker rather than threaded through a new PCL field.

The net-off probe

The presence of the bwrap binary is necessary but not sufficient: bwrap --unshare-net needs unprivileged-user-namespace support. That is present on a normal Linux host and under docker run --privileged, but not inside an unprivileged docker build step ("Creating new namespace failed: Operation not permitted"). So net_off_sandbox_usable() probes once with bwrap --unshare-net — exercising the same unprivileged network-namespace unshare a net-off build relies on (a real build uses --unshare-all plus the full mount set) — and caches the answer. When the probe fails, the selector falls back to SimpleSandbox plus the recipe-level offline strategy — the macOS / D-D posture. Hermetic, network-off builds are therefore a Linux-deploy-target guarantee; macOS dev builds rest their reproducibility on the vendor FODs and lockfiles, not on network isolation. CI enforces the offline build on Linux.

The live proof

This is demonstrated on a real polyglot Go application, not a toy. examples/tangled-deploy builds the Tangled knot + appview — a Tailwind asset stage plus two Go compiles — entirely offline:

  • TailwindBin is a FOD pinning the Tailwind binary to a content hash.
  • TangledVendor is a FOD running go mod vendor (network on) and pinning the resulting vendor/ tree to a content hash.
  • Tangled lists both as deps and compiles with go build -mod=vendor. On a privileged Linux host the build runs in the network-off bwrap sandbox; elsewhere -mod=vendor keeps it offline-in-effect.

Verified in Docker: the packages build, the FOD hashes match on rebuild (reproducible), the pinned vendor hash rides into the build's canonical hash (cache-correct), the offline build succeeds, and the stack boots. The conformance coverage is in tests/a_unit/realise/test_adr025_vendor_cache_correctness.py (two hashes → two paths), tests/a_unit/realise/test_offline_sandbox_selection.py (the net-off branch, gated to Linux/privileged), and test_conformance_stage6_fixed_output.py (FOD drift → [E13]).

Honest residue

What is not yet hermetic, stated plainly (ADR-025):

  • The FOD prepare-source still runs net-on in SimpleSandbox. The vendor FOD's fetch step does not yet run in BubblewrapSandbox(network=True); it runs on the host. Its output is hash-verified regardless, so the result is pinned — but the fetch environment is not isolated.
  • The toolchain is the image's, not a store dep. The go / gcc used to build are the build image's, not store-built toolchains pinned as deps. ADR-025 wants the toolchain in the vendor FOD's deps (the resolver version affects the lockfile→tree mapping); that is the plan-04 bootstrap axis, still ahead.
  • Per-language rollout awaits a driver. Only the Go path (go mod vendor) is exercised. std.npm (depsHash), std.cargo (cargoHash), and codegen-as-a-stage follow the same FOD-as-input pattern but stay unbuilt until a real consumer needs them. Remote codegen (e.g. buf BSR remote plugins) is out of scope by design — it executes un-pinned remote code.