# AGENTS.md ## Project Overview Unified NixOS flake for three hosts: | Host | Role | nixpkgs channel | Activation | |------|------|----------------|-----------| | `mreow` | Framework 13 AMD AI 300 laptop (niri, greetd, swaylock) | `nixos-unstable` | `./deploy.sh` locally | | `yarn` | AMD Zen 5 desktop (niri + Jovian-NixOS Steam deck mode, impermanence) | `nixos-unstable` | pull from CI binary cache | | `muffin` | AMD Zen 3 server (Caddy, ZFS, agenix, deploy-rs, 25+ services) | `nixos-25.11` | deploy-rs from CI | One `flake.nix` declares both channels (`nixpkgs` and `nixpkgs-stable`) and composes each host from the correct channel. No single-channel migration is intended. History pre-dating this repo lives in the merged subtree branches from `dotfiles` (commit `e9a44f6`) and `server-config` (commit `4bc5d57`). Use `git log ` (without `--follow`) and traverse back through the merge commits `dc481c2` and `6448a04` for pre-unify history. ## Layout ``` flake.nix # 3 hosts, 2 channels deploy.sh # wrapper: current-host rebuild or `muffin` deploy-rs hosts// # host entrypoints (default.nix, home.nix, disk.nix, …) modules/ # flat namespace; see module naming below common.nix # imported by ALL hosts (nix settings, doas, fish shim) desktop-*.nix # imported by mreow/yarn only server-*.nix # imported by muffin only .nix # scoped by filename (age-secrets, zfs, no-rgb, …) home/ profiles/{gui,desktop,no-gui}.nix # home-manager profiles progs/.nix # one file per program (fish, helix, niri, zen/, emacs, …) util/.nix # small derivations services/ # muffin-only: caddy, jellyfin, gitea, matrix, monero, … tests/ # pkgs.testers.runNixOSTest suite lib/ default.nix # extends nixpkgs-stable.lib with mkCaddyReverseProxy, serviceMountWithZpool, … overlays.nix # jellyfin-exporter, igpu-exporter, reflac, ensureZfsMounts patches/nixpkgs/ # applied to nixpkgs-stable for muffin builds secrets/ secrets.nix # agenix recipients (who can decrypt each .age) desktop/ # agenix *.age (mreow + yarn) + disk-password (install-time only, git-crypt) home/ # git-crypt: per-user HM secrets (api keys, steam id) server/ # agenix *.age + git-crypt *.nix/*.tar/livekit_keys (muffin) usb-secrets/ # USB-resident agenix identity for muffin (git-crypt inside the repo) ``` **Never read or write files under `secrets/`.** They are encrypted at rest (git-crypt for plaintext, agenix for `.age`). The git-crypt key is delivered to `muffin` at runtime as `/run/agenix/git-crypt-key-nixos.age`. ## Build & Deploy ```sh # --- from any host --- nix fmt # nixfmt-tree nix flake update # bump both channels + inputs nix flake update --input-name nixpkgs # bump just desktops' channel nix flake update --input-name nixpkgs-stable # bump just muffin's channel # --- per-host eval / build (add -L for verbose logs) --- nix build .#nixosConfigurations.mreow.config.system.build.toplevel -L nix build .#nixosConfigurations.yarn.config.system.build.toplevel -L nix build .#nixosConfigurations.muffin.config.system.build.toplevel -L # --- quick eval without building --- nix eval .#nixosConfigurations.muffin.config.system.build.toplevel --no-build 2>&1 | head -5 # --- activate on current host (mreow / yarn only) --- ./deploy.sh # boot (default; next reboot) ./deploy.sh switch # apply immediately ./deploy.sh test # apply without boot entry ./deploy.sh build # build only # --- deploy to muffin from anywhere --- ./deploy.sh muffin # equivalent to: nix run .#deploy -- .#muffin # --- tests (muffin) --- nix build .#packages.x86_64-linux.tests -L # all tests (slow) nix build .#test-zfsTest -L # one test by name # test names are the keys of tests/tests.nix; pattern is test- ``` No unit tests for desktop configs. Validation is the `nix build` exit code plus the successful `nix-diff` against the previous generation. If Nix complains about a missing file, `git add` it first — flakes only see tracked files. ## Module naming | Prefix | Meaning | Example | |--------|---------|---------| | `common-` | imported by ALL hosts | `common-doas.nix`, `common-nix.nix`, `common-shell-fish.nix` | | `desktop-` | imported by mreow + yarn only | `desktop-common.nix`, `desktop-steam.nix`, `desktop-networkmanager.nix` | | `server-` | imported by muffin only | `server-security.nix`, `server-power.nix`, `server-impermanence.nix`, `server-lanzaboote-agenix.nix` | | *(none)* | host-specific filename-scoped; see file contents | `zfs.nix`, `no-rgb.nix` (yarn + muffin) | New modules: pick the narrowest prefix that's true, then add the import explicitly in the host's `default.nix` (there is no auto-discovery). ## Code style - **Formatter**: `nixfmt-tree` (declared in `flake.nix`). Run `nix fmt` before every commit. - **Indentation**: 2 spaces, enforced by the formatter. - **Function args**: one per line, trailing comma, always end with `...`: ```nix { config, lib, pkgs, username, ... }: ``` - **Imports**: relative paths, one per line. Use the `../../modules/` style from `hosts/`; do not invent new aggregator modules unless more than one host uses the aggregation. - **Package paths**: `lib.getExe pkgs.foo` over `"${pkgs.foo}/bin/foo"` when the derivation declares `meta.mainProgram`. - **Unfree packages**: allowlisted per-module via `nixpkgs.config.allowUnfreePredicate`. Do not add a global permit. - **Comments**: lowercase, `#` style. Use `# TODO!` / `# BUG!` / `# FIX:` prefixes for known issues that should be searchable. - **No trailing commas** (Nix syntax forbids them). - **`lib.mkDefault` / `lib.mkForce`**: prefer `mkDefault` in shared modules so hosts can override without fighting priority; use `mkForce` only to beat inherited defaults you can't reach any other way. ## Secrets - **git-crypt** covers `secrets/**` per the root `.gitattributes`. Initialized with a single symmetric key checked into `secrets/server/git-crypt-key-nixos.age` (agenix-encrypted to the USB SSH identity). - **agenix** decrypts `*.age` into `/run/agenix/` at activation on every host: - **muffin**: identity is `/mnt/usb-secrets/usb-secrets-key` (ssh-ed25519 on a physical USB). Wired in `modules/usb-secrets.nix`. - **mreow + yarn**: identity is `/var/lib/agenix/tpm-identity` (an `age-plugin-tpm` handle sealed by the host's TPM 2.0). Wired in `modules/desktop-age-secrets.nix`; yarn persists `/var/lib/agenix` through impermanence. - **Recipients** are declared in `secrets/secrets.nix`. Desktop secrets are encrypted to the admin SSH key + each host's TPM recipient; server secrets stay encrypted to the muffin USB key. - **Bootstrap a new desktop**: run `doas scripts/bootstrap-desktop-tpm.sh` on the host. It generates a TPM-sealed identity at `/var/lib/agenix/tpm-identity` and prints an `age1tpm1…` recipient. Append it to the `tpm` list in `secrets/secrets.nix`, run `agenix -r` to re-encrypt, commit, `./deploy.sh switch`. - **Encrypting a new server secret** uses the SSH public key directly with `age -R`: ```sh age -R <(ssh-keygen -y -f secrets/usb-secrets/usb-secrets-key) \ -o secrets/server/.age \ /path/to/plaintext ``` For desktop secrets, prefer `agenix -e secrets/desktop/.age` from a shell with `age-plugin-tpm` on PATH — it reads `secrets/secrets.nix` and encrypts to every recipient listed there. - **DO NOT use `ssh-to-age`**. It produces `X25519` recipient stanzas, which the SSH private key on muffin cannot decrypt (it only decrypts `ssh-ed25519` stanzas produced by `age -R` against the SSH pubkey). Mismatched stanzas show up as `age: error: no identity matched any of the recipients` at deploy time. - Never read or commit plaintext secrets. Never log secret values. ## Service pattern (muffin) Each file under `services/` follows this shape: 1. `imports` block with `lib.serviceMountWithZpool` and (optionally) `lib.serviceFilePerms`. 2. The service configuration (`services. = { … }`). 3. Caddy reverse-proxy vhost (usually via `lib.mkCaddyReverseProxy` in `lib/default.nix`). 4. Firewall rules (`networking.firewall.allowed{TCP,UDP}Ports`) if externally reachable. 5. `services.fail2ban.jails.` if the service authenticates users. Custom lib helpers (in `lib/default.nix`) to prefer over reinventing: - `lib.serviceMountWithZpool [dirs]` - `lib.serviceFilePerms [tmpfilesRules]` - `lib.optimizePackage ` — applies `-O3 -march=znver3 -mtune=znver3` - `lib.vpnNamespaceOpenPort ` — confines service to the WireGuard namespace - `lib.mkCaddyReverseProxy { subdomain|domain, port, auth ? false, vpn ? false }` - `lib.mkFail2banJail { name, unitName ? "${name}.service", failregex }` - `lib.mkGrafanaAnnotationService { name, description, script, after ? [], environment ? {}, loadCredential ? null }` - `lib.extractArrApiKey ` — shell snippet to read the `` element Hard requirements that are asserted at eval time: - **Port uniqueness**: every port in `hosts/muffin/service-configs.nix` `ports.{public,private}` must be unique. The flake asserts this. - **Public/private segregation**: public ports must appear in the firewall allow-list; private ports must not. The flake asserts both directions. - **Hugepages**: services that need 2 MiB hugepages declare their budget in `service-configs.nix` under `hugepages_2m.services`. The `vm.nr_hugepages` sysctl is derived from the total. - **PostgreSQL-first**: any service that supports PostgreSQL uses it (via peer-auth Unix socket when possible). Per-service Sqlite (or similar) is not liked. ## Deploy guard (muffin) `modules/server-deploy-guard.nix` aggregates per-service "is anyone using this right now?" checks into a single `deploy-guard-check` binary on muffin. Enforcement is **preflight-only** — the guard runs over SSH *before* deploy-rs is invoked; activation itself is never gated. This matters because deploy-rs sets the new profile pointer before running the activation script, so a failed activation triggers auto-rollback which re-runs `switch-to-configuration` on the previous generation — that re-activation rotates agenix secrets, reinstalls lanzaboote, and reloads systemd units. The only safe place to stop a deploy is before deploy-rs starts. Two drivers invoke the preflight: - **`./deploy.sh muffin`** SSHes to `server-public` and runs `deploy-guard-check`. SSH connection failure is a hard abort (rc=255) because there is no second gate. `./deploy.sh muffin --force` (or `DEPLOY_GUARD_FORCE=1 ./deploy.sh muffin`) skips the preflight entirely. - **CI (`.gitea/workflows/deploy.yml`)** has a `Deploy guard preflight` step between `Build muffin` and `Deploy via deploy-rs`. A non-zero exit fails the job before any closure copy or activation. ### Adding a new check In the service's own file (or a sibling `-deploy-guard.nix`): ```nix { config, lib, pkgs, ... }: let check = pkgs.writeShellApplication { name = "deploy-guard-check-"; runtimeInputs = [ /* curl, jq, etc. */ ]; text = '' # exit 0 when the service is idle / unreachable (soft-fail) # exit 1 with a reason on stdout/stderr when live users would be disrupted ''; }; in lib.mkIf config.services..enable { services.deployGuard.checks. = { description = "Active users"; command = check; }; } ``` Existing registrations live in `services/jellyfin/jellyfin-deploy-guard.nix` (REST `/Sessions` via curl+jq) and `services/minecraft-deploy-guard.nix` (Server List Ping via `mcstatus`). Prefer soft-fail on unreachable — a service that's already down has no users to disrupt. ## Deploy finalize (muffin) `modules/server-deploy-finalize.nix` solves the self-deploy problem: the gitea-actions runner driving CI deploys lives on muffin itself, so a direct `switch-to-configuration switch` restarts the runner mid-activation, killing the SSH session, the CI job, and deploy-rs's magic-rollback handshake. The failure mode is visible as "deploy appears to fail even though the new config landed" (or worse, a rollback storm). The fix is a two-phase activation wired into `deploy.nodes.muffin.profiles.system.path` in `flake.nix`: 1. `switch-to-configuration boot` — bootloader-only, no service restarts. The runner, SSH session, and magic-rollback survive. 2. `deploy-finalize` — schedules a detached `systemd-run --on-active=N` transient unit (default 60s). The unit is owned by pid1, so it survives the eventual runner restart. If `/run/booted-system/{kernel,initrd,kernel-modules}` differs from the new profile's, the unit runs `systemctl reboot`; otherwise it runs `switch-to-configuration switch`. That is, reboot is dynamically gated on kernel/initrd/kernel-modules change. The 60s delay is tuned so the CI job (or manual `./deploy.sh muffin`) has time to emit status/notification steps before the runner is recycled. Back-to-back deploys supersede each other: each invocation cancels any still-pending `deploy-finalize-*.timer` before scheduling its own. `deploy-finalize --dry-run` prints the decision without scheduling anything — useful when debugging. Prior art: the 3-path `{kernel,initrd,kernel-modules}` diff is lifted from nixpkgs's `system.autoUpgrade` module (the `allowReboot = true` branch) and was packaged the same way in [obsidiansystems/obelisk#957](https://github.com/obsidiansystems/obelisk/pull/957). nixpkgs#185030 tracks lifting it into `switch-to-configuration` proper but has been stale since 2025-07. The self-deploy `systemd-run` detachment is the proposed fix from [deploy-rs#153](https://github.com/serokell/deploy-rs/issues/153), also unmerged upstream. ## Technical details - **Privilege escalation**: `doas` everywhere; `sudo` is disabled on every host. - **Shell**: fish. `bash` login shells re-exec into fish via `programs.bash.interactiveShellInit` (see `modules/common-shell-fish.nix`). - **Secure boot**: lanzaboote. Every host extracts keys from an agenix-decrypted tar at activation — desktops via `modules/desktop-lanzaboote-agenix.nix`, muffin via `modules/server-lanzaboote-agenix.nix`. - **Impermanence**: muffin is tmpfs-root with `/persistent` surviving reboots (`modules/server-impermanence.nix`); yarn binds `/home/primary` from `/persistent` (`hosts/yarn/impermanence.nix`). - **Disks**: disko. - **Binary cache**: muffin runs harmonia; desktops consume it at `https://nix-cache.sigkill.computer`. - **Kernel**: - Desktops: `linux-cachyos-bore-lto`, `processorOpt = "x86_64-v3"` (see `modules/desktop-common.nix` — also trims ~80 legacy subsystems). - muffin: `linuxPackages_6_12` (pinned; 6.18 has a ZFS deadlock in `dbuf_evict`). - **Domain**: `sigkill.computer`. The old `gardling.com` redirects automatically. ## Agent-specific instructions - If instructed to commit, **disable GPG signing** (`git commit --no-gpg-sign`). The author's GPG key is not available in this environment. - Use `nix-shell -p ` if a tool is missing from the environment. - For `nix build`, always append `-L` for verbose logs. - If Nix reports a missing file, run `git add ` first — flakes only see git-tracked files. - Do not read files under `secrets/`. - Run `nix fmt` after editing any `.nix` file. - Validate every change with `nix build .#nixosConfigurations..config.system.build.toplevel -L`. - Commit messages are terse, lowercase; prefix with `:` when narrowly scoped (`caddy: add redirect`, `zfs: remove unneeded options`, `mreow: bump kernel`). Generic changes use `update` or a short description.