BIOS 2423→4101 update on yarn required an fTPM reset, which broke the sealed age identity at /var/lib/agenix/tpm-identity. Bootstrapped a new identity against the new SRK and rotated yarn's recipient. age-plugin-tpm 1.0+ emits age1tag1… (p256tag) recipients by default and refuses to encrypt to legacy age1tpm1… ones, so rotated mreow's recipient to the same encoding (same key, new bech32 HRP) and added an age-plugin-tag→age-plugin-tpm symlink in the rage wrapper so rage's plugin dispatch finds the binary under the new prefix. Stripped the trailing host labels from the tpm recipient strings — rage's stricter bech32 parser now rejects the trailing whitespace; labels live in adjacent Nix comments instead.
16 KiB
AGENTS.md
Project Overview
Unified NixOS flake for three hosts:
| Host | Role | nixpkgs channel | Activation |
|---|---|---|---|
mreow |
Framework 13 AMD AI 300 laptop (niri, greetd, swaylock) | nixos-unstable |
./deploy.sh locally |
yarn |
AMD Zen 5 desktop (niri + Jovian-NixOS Steam deck mode, impermanence) | nixos-unstable |
pull from CI binary cache |
muffin |
AMD Zen 3 server (Caddy, ZFS, agenix, deploy-rs, 25+ services) | nixos-25.11 |
deploy-rs from CI |
One flake.nix declares both channels (nixpkgs and nixpkgs-stable) and composes each host from the correct channel. No single-channel migration is intended.
History pre-dating this repo lives in the merged subtree branches from dotfiles (commit e9a44f6) and server-config (commit 4bc5d57). Use git log <path> (without --follow) and traverse back through the merge commits dc481c2 and 6448a04 for pre-unify history.
Layout
flake.nix # 3 hosts, 2 channels
deploy.sh # wrapper: current-host rebuild or `muffin` deploy-rs
hosts/<host>/ # host entrypoints (default.nix, home.nix, disk.nix, …)
modules/ # flat namespace; see module naming below
common.nix # imported by ALL hosts (nix settings, doas, fish shim)
desktop-*.nix # imported by mreow/yarn only
server-*.nix # imported by muffin only
<bare>.nix # scoped by filename (age-secrets, zfs, no-rgb, …)
home/
profiles/{gui,desktop,no-gui}.nix # home-manager profiles
progs/<program>.nix # one file per program (fish, helix, niri, zen/, emacs, …)
util/<helper>.nix # small derivations
services/ # muffin-only: caddy, jellyfin, gitea, matrix, monero, …
tests/ # pkgs.testers.runNixOSTest suite
lib/
default.nix # extends nixpkgs-stable.lib with mkCaddyReverseProxy, serviceMountWithZpool, …
overlays.nix # jellyfin-exporter, igpu-exporter, reflac, ensureZfsMounts
patches/nixpkgs/ # applied to nixpkgs-stable for muffin builds
secrets/
secrets.nix # agenix recipients (who can decrypt each .age)
desktop/ # agenix *.age (mreow + yarn) + disk-password (install-time only, git-crypt)
home/ # git-crypt: per-user HM secrets (api keys, steam id)
server/ # agenix *.age + git-crypt *.nix/*.tar/livekit_keys (muffin)
usb-secrets/ # USB-resident agenix identity for muffin (git-crypt inside the repo)
Never read or write files under secrets/. They are encrypted at rest (git-crypt for plaintext, agenix for .age). The git-crypt key is delivered to muffin at runtime as /run/agenix/git-crypt-key-nixos.age.
Build & Deploy
# --- from any host ---
nix fmt # nixfmt-tree
nix flake update # bump both channels + inputs
nix flake update --input-name nixpkgs # bump just desktops' channel
nix flake update --input-name nixpkgs-stable # bump just muffin's channel
# --- per-host eval / build (add -L for verbose logs) ---
nix build .#nixosConfigurations.mreow.config.system.build.toplevel -L
nix build .#nixosConfigurations.yarn.config.system.build.toplevel -L
nix build .#nixosConfigurations.muffin.config.system.build.toplevel -L
# --- quick eval without building ---
nix eval .#nixosConfigurations.muffin.config.system.build.toplevel --no-build 2>&1 | head -5
# --- activate on current host (mreow / yarn only) ---
./deploy.sh # boot (default; next reboot)
./deploy.sh switch # apply immediately
./deploy.sh test # apply without boot entry
./deploy.sh build # build only
# --- deploy to muffin from anywhere ---
./deploy.sh muffin
# equivalent to:
nix run .#deploy -- .#muffin
# --- tests (muffin) ---
nix build .#packages.x86_64-linux.tests -L # all tests (slow)
nix build .#test-zfsTest -L # one test by name
# test names are the keys of tests/tests.nix; pattern is test-<name>
No unit tests for desktop configs. Validation is the nix build exit code plus the successful nix-diff against the previous generation.
If Nix complains about a missing file, git add it first — flakes only see tracked files.
Module naming
| Prefix | Meaning | Example |
|---|---|---|
common- |
imported by ALL hosts | common-doas.nix, common-nix.nix, common-shell-fish.nix |
desktop- |
imported by mreow + yarn only | desktop-common.nix, desktop-steam.nix, desktop-networkmanager.nix |
server- |
imported by muffin only | server-security.nix, server-power.nix, server-impermanence.nix, server-lanzaboote-agenix.nix |
| (none) | host-specific filename-scoped; see file contents | zfs.nix, no-rgb.nix (yarn + muffin) |
New modules: pick the narrowest prefix that's true, then add the import explicitly in the host's default.nix (there is no auto-discovery).
Code style
- Formatter:
nixfmt-tree(declared inflake.nix). Runnix fmtbefore every commit. - Indentation: 2 spaces, enforced by the formatter.
- Function args: one per line, trailing comma, always end with
...:{ config, lib, pkgs, username, ... }: - Imports: relative paths, one per line. Use the
../../modules/style fromhosts/; do not invent new aggregator modules unless more than one host uses the aggregation. - Package paths:
lib.getExe pkgs.fooover"${pkgs.foo}/bin/foo"when the derivation declaresmeta.mainProgram. - Unfree packages: allowlisted per-module via
nixpkgs.config.allowUnfreePredicate. Do not add a global permit. - Comments: lowercase,
#style. Use# TODO!/# BUG!/# FIX:prefixes for known issues that should be searchable. - No trailing commas (Nix syntax forbids them).
lib.mkDefault/lib.mkForce: prefermkDefaultin shared modules so hosts can override without fighting priority; usemkForceonly to beat inherited defaults you can't reach any other way.
Secrets
- git-crypt covers
secrets/**per the root.gitattributes. Initialized with a single symmetric key checked intosecrets/server/git-crypt-key-nixos.age(agenix-encrypted to the USB SSH identity). - agenix decrypts
*.ageinto/run/agenix/at activation on every host:- muffin: identity is
/mnt/usb-secrets/usb-secrets-key(ssh-ed25519 on a physical USB). Wired inmodules/usb-secrets.nix. - mreow + yarn: identity is
/var/lib/agenix/tpm-identity(anage-plugin-tpmhandle sealed by the host's TPM 2.0). Wired inmodules/desktop-age-secrets.nix; yarn persists/var/lib/agenixthrough impermanence.
- muffin: identity is
- Recipients are declared in
secrets/secrets.nix. Desktop secrets are encrypted to the admin SSH key + each host's TPM recipient; server secrets stay encrypted to the muffin USB key. - Bootstrap a new desktop: run
doas scripts/bootstrap-desktop-tpm.shon the host. It generates a TPM-sealed identity at/var/lib/agenix/tpm-identityand prints anage1tag1…recipient (legacyage1tpm1…recipients still decrypt butage-plugin-tpm1.0+ refuses to encrypt to them;modules/desktop-age-secrets.nixsymlinksage-plugin-tag → age-plugin-tpmso rage's plugin dispatch finds the binary under both prefixes). Append it to thetpmlist insecrets/secrets.nix(label as a Nix# hostcomment, not as a trailing word inside the recipient string — rage's bech32 parser rejects the trailing whitespace), runagenix -rto re-encrypt, commit,./deploy.sh switch. - Encrypting a new server secret uses the SSH public key directly with
age -R:For desktop secrets, preferage -R <(ssh-keygen -y -f secrets/usb-secrets/usb-secrets-key) \ -o secrets/server/<name>.age \ /path/to/plaintextagenix -e secrets/desktop/<name>.agefrom a shell withage-plugin-tpmon PATH — it readssecrets/secrets.nixand encrypts to every recipient listed there. - DO NOT use
ssh-to-age. It producesX25519recipient stanzas, which the SSH private key on muffin cannot decrypt (it only decryptsssh-ed25519stanzas produced byage -Ragainst the SSH pubkey). Mismatched stanzas show up asage: error: no identity matched any of the recipientsat deploy time. - Never read or commit plaintext secrets. Never log secret values.
Service pattern (muffin)
Each file under services/ follows this shape:
importsblock withlib.serviceMountWithZpooland (optionally)lib.serviceFilePerms.- The service configuration (
services.<name> = { … }). - Caddy reverse-proxy vhost (usually via
lib.mkCaddyReverseProxyinlib/default.nix). - Firewall rules (
networking.firewall.allowed{TCP,UDP}Ports) if externally reachable. services.fail2ban.jails.<name>if the service authenticates users.
Custom lib helpers (in lib/default.nix) to prefer over reinventing:
lib.serviceMountWithZpool <service> <zpool> [dirs]lib.serviceFilePerms <service> [tmpfilesRules]lib.optimizePackage <pkg>— applies-O3 -march=znver3 -mtune=znver3lib.vpnNamespaceOpenPort <port> <service>— confines service to the WireGuard namespacelib.mkCaddyReverseProxy { subdomain|domain, port, auth ? false, vpn ? false }lib.mkFail2banJail { name, unitName ? "${name}.service", failregex }lib.mkGrafanaAnnotationService { name, description, script, after ? [], environment ? {}, loadCredential ? null }lib.extractArrApiKey <configXmlPath>— shell snippet to read the<ApiKey>element
Hard requirements that are asserted at eval time:
- Port uniqueness: every port in
hosts/muffin/service-configs.nixports.{public,private}must be unique. The flake asserts this. - Public/private segregation: public ports must appear in the firewall allow-list; private ports must not. The flake asserts both directions.
- Hugepages: services that need 2 MiB hugepages declare their budget in
service-configs.nixunderhugepages_2m.services. Thevm.nr_hugepagessysctl is derived from the total. - PostgreSQL-first: any service that supports PostgreSQL uses it (via peer-auth Unix socket when possible). Per-service Sqlite (or similar) is not liked.
Deploy guard (muffin)
modules/server-deploy-guard.nix aggregates per-service "is anyone using this right now?" checks into a single deploy-guard-check binary on muffin. Enforcement is preflight-only — the guard runs over SSH before deploy-rs is invoked; activation itself is never gated. This matters because deploy-rs sets the new profile pointer before running the activation script, so a failed activation triggers auto-rollback which re-runs switch-to-configuration on the previous generation — that re-activation rotates agenix secrets, reinstalls lanzaboote, and reloads systemd units. The only safe place to stop a deploy is before deploy-rs starts.
Two drivers invoke the preflight:
./deploy.sh muffinSSHes toserver-publicand runsdeploy-guard-check. SSH connection failure is a hard abort (rc=255) because there is no second gate../deploy.sh muffin --force(orDEPLOY_GUARD_FORCE=1 ./deploy.sh muffin) skips the preflight entirely.- CI (
.gitea/workflows/deploy.yml) has aDeploy guard preflightstep betweenBuild muffinandDeploy via deploy-rs. A non-zero exit fails the job before any closure copy or activation.
Adding a new check
In the service's own file (or a sibling <service>-deploy-guard.nix):
{ config, lib, pkgs, ... }:
let
check = pkgs.writeShellApplication {
name = "deploy-guard-check-<service>";
runtimeInputs = [ /* curl, jq, etc. */ ];
text = ''
# exit 0 when the service is idle / unreachable (soft-fail)
# exit 1 with a reason on stdout/stderr when live users would be disrupted
'';
};
in
lib.mkIf config.services.<service>.enable {
services.deployGuard.checks.<service> = {
description = "Active <service> users";
command = check;
};
}
Existing registrations live in services/jellyfin/jellyfin-deploy-guard.nix (REST /Sessions via curl+jq) and services/minecraft-deploy-guard.nix (Server List Ping via mcstatus). Prefer soft-fail on unreachable — a service that's already down has no users to disrupt.
Deploy finalize (muffin)
modules/server-deploy-finalize.nix solves the self-deploy problem: the gitea-actions runner driving CI deploys lives on muffin itself, so a direct switch-to-configuration switch restarts the runner mid-activation, killing the SSH session, the CI job, and deploy-rs's magic-rollback handshake. The failure mode is visible as "deploy appears to fail even though the new config landed" (or worse, a rollback storm).
The fix is a two-phase activation wired into deploy.nodes.muffin.profiles.system.path in flake.nix:
switch-to-configuration boot— bootloader-only, no service restarts. The runner, SSH session, and magic-rollback survive.deploy-finalize— schedules a detachedsystemd-run --on-active=Ntransient unit (default 60s). The unit is owned by pid1, so it survives the eventual runner restart. If/run/booted-system/{kernel,initrd,kernel-modules}differs from the new profile's, the unit runssystemctl reboot; otherwise it runsswitch-to-configuration switch.
That is, reboot is dynamically gated on kernel/initrd/kernel-modules change. The 60s delay is tuned so the CI job (or manual ./deploy.sh muffin) has time to emit status/notification steps before the runner is recycled.
Back-to-back deploys supersede each other: each invocation cancels any still-pending deploy-finalize-*.timer before scheduling its own. deploy-finalize --dry-run prints the decision without scheduling anything — useful when debugging.
Prior art: the 3-path {kernel,initrd,kernel-modules} diff is lifted from nixpkgs's system.autoUpgrade module (the allowReboot = true branch) and was packaged the same way in obsidiansystems/obelisk#957. nixpkgs#185030 tracks lifting it into switch-to-configuration proper but has been stale since 2025-07. The self-deploy systemd-run detachment is the proposed fix from deploy-rs#153, also unmerged upstream.
Technical details
- Privilege escalation:
doaseverywhere;sudois disabled on every host. - Shell: fish.
bashlogin shells re-exec into fish viaprograms.bash.interactiveShellInit(seemodules/common-shell-fish.nix). - Secure boot: lanzaboote. Every host extracts keys from an agenix-decrypted tar at activation — desktops via
modules/desktop-lanzaboote-agenix.nix, muffin viamodules/server-lanzaboote-agenix.nix. - Impermanence: muffin is tmpfs-root with
/persistentsurviving reboots (modules/server-impermanence.nix); yarn binds/home/primaryfrom/persistent(hosts/yarn/impermanence.nix). - Disks: disko.
- Binary cache: muffin runs harmonia; desktops consume it at
https://nix-cache.sigkill.computer. - Kernel:
- Desktops:
linux-cachyos-bore-lto,processorOpt = "x86_64-v3"(seemodules/desktop-common.nix— also trims ~80 legacy subsystems). - muffin:
linuxPackages_6_12(pinned; 6.18 has a ZFS deadlock indbuf_evict).
- Desktops:
- Domain:
sigkill.computer. The oldgardling.comredirects automatically.
Agent-specific instructions
- If instructed to commit, disable GPG signing (
git commit --no-gpg-sign). The author's GPG key is not available in this environment. - Use
nix-shell -p <package>if a tool is missing from the environment. - For
nix build, always append-Lfor verbose logs. - If Nix reports a missing file, run
git add <file>first — flakes only see git-tracked files. - Do not read files under
secrets/. - Run
nix fmtafter editing any.nixfile. - Validate every change with
nix build .#nixosConfigurations.<host>.config.system.build.toplevel -L. - Commit messages are terse, lowercase; prefix with
<scope>:when narrowly scoped (caddy: add redirect,zfs: remove unneeded options,mreow: bump kernel). Generic changes useupdateor a short description.