Files
nixos/AGENTS.md

16 KiB

AGENTS.md

Project Overview

Unified NixOS flake for three hosts:

Host Role nixpkgs channel Activation
mreow Framework 13 AMD AI 300 laptop (niri, greetd, swaylock) nixos-unstable ./deploy.sh locally
yarn AMD Zen 5 desktop (niri + Jovian-NixOS Steam deck mode, impermanence) nixos-unstable pull from CI binary cache
muffin AMD Zen 3 server (Caddy, ZFS, agenix, deploy-rs, 25+ services) nixos-25.11 deploy-rs from CI

One flake.nix declares both channels (nixpkgs and nixpkgs-stable) and composes each host from the correct channel. No single-channel migration is intended.

History pre-dating this repo lives in the merged subtree branches from dotfiles (commit e9a44f6) and server-config (commit 4bc5d57). Use git log <path> (without --follow) and traverse back through the merge commits dc481c2 and 6448a04 for pre-unify history.

Layout

flake.nix                  # 3 hosts, 2 channels
deploy.sh                  # wrapper: current-host rebuild or `muffin` deploy-rs
hosts/<host>/              # host entrypoints (default.nix, home.nix, disk.nix, …)
modules/                   # flat namespace; see module naming below
  common.nix               # imported by ALL hosts (nix settings, doas, fish shim)
  desktop-*.nix            # imported by mreow/yarn only
  server-*.nix             # imported by muffin only
  <bare>.nix               # scoped by filename (age-secrets, zfs, no-rgb, …)
home/
  profiles/{gui,desktop,no-gui}.nix   # home-manager profiles
  progs/<program>.nix                 # one file per program (fish, helix, niri, zen/, emacs, …)
  util/<helper>.nix                   # small derivations
services/                  # muffin-only: caddy, jellyfin, gitea, matrix, monero, …
tests/                     # pkgs.testers.runNixOSTest suite
lib/
  default.nix              # extends nixpkgs-stable.lib with mkCaddyReverseProxy, serviceMountWithZpool, …
  overlays.nix             # jellyfin-exporter, igpu-exporter, reflac, ensureZfsMounts
patches/nixpkgs/           # applied to nixpkgs-stable for muffin builds
secrets/
  secrets.nix              # agenix recipients (who can decrypt each .age)
  desktop/                 # agenix *.age (mreow + yarn) + disk-password (install-time only, git-crypt)
  home/                    # git-crypt: per-user HM secrets (api keys, steam id)
  server/                  # agenix *.age + git-crypt *.nix/*.tar/livekit_keys (muffin)
  usb-secrets/             # USB-resident agenix identity for muffin (git-crypt inside the repo)

Never read or write files under secrets/. They are encrypted at rest (git-crypt for plaintext, agenix for .age). The git-crypt key is delivered to muffin at runtime as /run/agenix/git-crypt-key-nixos.age.

Build & Deploy

# --- from any host ---
nix fmt                                                          # nixfmt-tree
nix flake update                                                 # bump both channels + inputs
nix flake update --input-name nixpkgs                            # bump just desktops' channel
nix flake update --input-name nixpkgs-stable                     # bump just muffin's channel

# --- per-host eval / build (add -L for verbose logs) ---
nix build .#nixosConfigurations.mreow.config.system.build.toplevel -L
nix build .#nixosConfigurations.yarn.config.system.build.toplevel -L
nix build .#nixosConfigurations.muffin.config.system.build.toplevel -L

# --- quick eval without building ---
nix eval .#nixosConfigurations.muffin.config.system.build.toplevel --no-build 2>&1 | head -5

# --- activate on current host (mreow / yarn only) ---
./deploy.sh                 # boot (default; next reboot)
./deploy.sh switch          # apply immediately
./deploy.sh test            # apply without boot entry
./deploy.sh build           # build only

# --- deploy to muffin from anywhere ---
./deploy.sh muffin
# equivalent to:
nix run .#deploy -- .#muffin

# --- tests (muffin) ---
nix build .#packages.x86_64-linux.tests -L        # all tests (slow)
nix build .#test-zfsTest -L                        # one test by name
# test names are the keys of tests/tests.nix; pattern is test-<name>

No unit tests for desktop configs. Validation is the nix build exit code plus the successful nix-diff against the previous generation.

If Nix complains about a missing file, git add it first — flakes only see tracked files.

Module naming

Prefix Meaning Example
common- imported by ALL hosts common-doas.nix, common-nix.nix, common-shell-fish.nix
desktop- imported by mreow + yarn only desktop-common.nix, desktop-steam.nix, desktop-networkmanager.nix
server- imported by muffin only server-security.nix, server-power.nix, server-impermanence.nix, server-lanzaboote-agenix.nix
(none) host-specific filename-scoped; see file contents zfs.nix, no-rgb.nix (yarn + muffin)

New modules: pick the narrowest prefix that's true, then add the import explicitly in the host's default.nix (there is no auto-discovery).

Code style

  • Formatter: nixfmt-tree (declared in flake.nix). Run nix fmt before every commit.
  • Indentation: 2 spaces, enforced by the formatter.
  • Function args: one per line, trailing comma, always end with ...:
    {
      config,
      lib,
      pkgs,
      username,
      ...
    }:
    
  • Imports: relative paths, one per line. Use the ../../modules/ style from hosts/; do not invent new aggregator modules unless more than one host uses the aggregation.
  • Package paths: lib.getExe pkgs.foo over "${pkgs.foo}/bin/foo" when the derivation declares meta.mainProgram.
  • Unfree packages: allowlisted per-module via nixpkgs.config.allowUnfreePredicate. Do not add a global permit.
  • Comments: lowercase, # style. Use # TODO! / # BUG! / # FIX: prefixes for known issues that should be searchable.
  • No trailing commas (Nix syntax forbids them).
  • lib.mkDefault / lib.mkForce: prefer mkDefault in shared modules so hosts can override without fighting priority; use mkForce only to beat inherited defaults you can't reach any other way.

Secrets

  • git-crypt covers secrets/** per the root .gitattributes. Initialized with a single symmetric key checked into secrets/server/git-crypt-key-nixos.age (agenix-encrypted to the USB SSH identity).
  • agenix decrypts *.age into /run/agenix/ at activation on every host:
    • muffin: identity is /mnt/usb-secrets/usb-secrets-key (ssh-ed25519 on a physical USB). Wired in modules/usb-secrets.nix.
    • mreow + yarn: identity is /var/lib/agenix/tpm-identity (an age-plugin-tpm handle sealed by the host's TPM 2.0). Wired in modules/desktop-age-secrets.nix; yarn persists /var/lib/agenix through impermanence.
  • Recipients are declared in secrets/secrets.nix. Desktop secrets are encrypted to the admin SSH key + each host's TPM recipient; server secrets stay encrypted to the muffin USB key.
  • Bootstrap a new desktop: run doas scripts/bootstrap-desktop-tpm.sh on the host. It generates a TPM-sealed identity at /var/lib/agenix/tpm-identity and prints an age1tpm1… recipient. Append it to the tpm list in secrets/secrets.nix, run agenix -r to re-encrypt, commit, ./deploy.sh switch.
  • Encrypting a new server secret uses the SSH public key directly with age -R:
    age -R <(ssh-keygen -y -f secrets/usb-secrets/usb-secrets-key) \
        -o secrets/server/<name>.age \
        /path/to/plaintext
    
    For desktop secrets, prefer agenix -e secrets/desktop/<name>.age from a shell with age-plugin-tpm on PATH — it reads secrets/secrets.nix and encrypts to every recipient listed there.
  • DO NOT use ssh-to-age. It produces X25519 recipient stanzas, which the SSH private key on muffin cannot decrypt (it only decrypts ssh-ed25519 stanzas produced by age -R against the SSH pubkey). Mismatched stanzas show up as age: error: no identity matched any of the recipients at deploy time.
  • Never read or commit plaintext secrets. Never log secret values.

Service pattern (muffin)

Each file under services/ follows this shape:

  1. imports block with lib.serviceMountWithZpool and (optionally) lib.serviceFilePerms.
  2. The service configuration (services.<name> = { … }).
  3. Caddy reverse-proxy vhost (usually via lib.mkCaddyReverseProxy in lib/default.nix).
  4. Firewall rules (networking.firewall.allowed{TCP,UDP}Ports) if externally reachable.
  5. services.fail2ban.jails.<name> if the service authenticates users.

Custom lib helpers (in lib/default.nix) to prefer over reinventing:

  • lib.serviceMountWithZpool <service> <zpool> [dirs]
  • lib.serviceFilePerms <service> [tmpfilesRules]
  • lib.optimizePackage <pkg> — applies -O3 -march=znver3 -mtune=znver3
  • lib.vpnNamespaceOpenPort <port> <service> — confines service to the WireGuard namespace
  • lib.mkCaddyReverseProxy { subdomain|domain, port, auth ? false, vpn ? false }
  • lib.mkFail2banJail { name, unitName ? "${name}.service", failregex }
  • lib.mkGrafanaAnnotationService { name, description, script, after ? [], environment ? {}, loadCredential ? null }
  • lib.extractArrApiKey <configXmlPath> — shell snippet to read the <ApiKey> element

Hard requirements that are asserted at eval time:

  • Port uniqueness: every port in hosts/muffin/service-configs.nix ports.{public,private} must be unique. The flake asserts this.
  • Public/private segregation: public ports must appear in the firewall allow-list; private ports must not. The flake asserts both directions.
  • Hugepages: services that need 2 MiB hugepages declare their budget in service-configs.nix under hugepages_2m.services. The vm.nr_hugepages sysctl is derived from the total.
  • PostgreSQL-first: any service that supports PostgreSQL uses it (via peer-auth Unix socket when possible). Per-service Sqlite (or similar) is not liked.

Deploy guard (muffin)

modules/server-deploy-guard.nix aggregates per-service "is anyone using this right now?" checks into a single deploy-guard-check binary on muffin. Enforcement is preflight-only — the guard runs over SSH before deploy-rs is invoked; activation itself is never gated. This matters because deploy-rs sets the new profile pointer before running the activation script, so a failed activation triggers auto-rollback which re-runs switch-to-configuration on the previous generation — that re-activation rotates agenix secrets, reinstalls lanzaboote, and reloads systemd units. The only safe place to stop a deploy is before deploy-rs starts.

Two drivers invoke the preflight:

  • ./deploy.sh muffin SSHes to server-public and runs deploy-guard-check. SSH connection failure is a hard abort (rc=255) because there is no second gate. ./deploy.sh muffin --force (or DEPLOY_GUARD_FORCE=1 ./deploy.sh muffin) skips the preflight entirely.
  • CI (.gitea/workflows/deploy.yml) has a Deploy guard preflight step between Build muffin and Deploy via deploy-rs. A non-zero exit fails the job before any closure copy or activation.

Adding a new check

In the service's own file (or a sibling <service>-deploy-guard.nix):

{ config, lib, pkgs, ... }:
let
  check = pkgs.writeShellApplication {
    name = "deploy-guard-check-<service>";
    runtimeInputs = [ /* curl, jq, etc. */ ];
    text = ''
      # exit 0 when the service is idle / unreachable (soft-fail)
      # exit 1 with a reason on stdout/stderr when live users would be disrupted
    '';
  };
in
lib.mkIf config.services.<service>.enable {
  services.deployGuard.checks.<service> = {
    description = "Active <service> users";
    command = check;
  };
}

Existing registrations live in services/jellyfin/jellyfin-deploy-guard.nix (REST /Sessions via curl+jq) and services/minecraft-deploy-guard.nix (Server List Ping via mcstatus). Prefer soft-fail on unreachable — a service that's already down has no users to disrupt.

Deploy finalize (muffin)

modules/server-deploy-finalize.nix solves the self-deploy problem: the gitea-actions runner driving CI deploys lives on muffin itself, so a direct switch-to-configuration switch restarts the runner mid-activation, killing the SSH session, the CI job, and deploy-rs's magic-rollback handshake. The failure mode is visible as "deploy appears to fail even though the new config landed" (or worse, a rollback storm).

The fix is a two-phase activation wired into deploy.nodes.muffin.profiles.system.path in flake.nix:

  1. switch-to-configuration boot — bootloader-only, no service restarts. The runner, SSH session, and magic-rollback survive.
  2. deploy-finalize — schedules a detached systemd-run --on-active=N transient unit (default 60s). The unit is owned by pid1, so it survives the eventual runner restart. If /run/booted-system/{kernel,initrd,kernel-modules} differs from the new profile's, the unit runs systemctl reboot; otherwise it runs switch-to-configuration switch.

That is, reboot is dynamically gated on kernel/initrd/kernel-modules change. The 60s delay is tuned so the CI job (or manual ./deploy.sh muffin) has time to emit status/notification steps before the runner is recycled.

Back-to-back deploys supersede each other: each invocation cancels any still-pending deploy-finalize-*.timer before scheduling its own. deploy-finalize --dry-run prints the decision without scheduling anything — useful when debugging.

Prior art: the 3-path {kernel,initrd,kernel-modules} diff is lifted from nixpkgs's system.autoUpgrade module (the allowReboot = true branch) and was packaged the same way in obsidiansystems/obelisk#957. nixpkgs#185030 tracks lifting it into switch-to-configuration proper but has been stale since 2025-07. The self-deploy systemd-run detachment is the proposed fix from deploy-rs#153, also unmerged upstream.

Technical details

  • Privilege escalation: doas everywhere; sudo is disabled on every host.
  • Shell: fish. bash login shells re-exec into fish via programs.bash.interactiveShellInit (see modules/common-shell-fish.nix).
  • Secure boot: lanzaboote. Every host extracts keys from an agenix-decrypted tar at activation — desktops via modules/desktop-lanzaboote-agenix.nix, muffin via modules/server-lanzaboote-agenix.nix.
  • Impermanence: muffin is tmpfs-root with /persistent surviving reboots (modules/server-impermanence.nix); yarn binds /home/primary from /persistent (hosts/yarn/impermanence.nix).
  • Disks: disko.
  • Binary cache: muffin runs harmonia; desktops consume it at https://nix-cache.sigkill.computer.
  • Kernel:
    • Desktops: linux-cachyos-bore-lto, processorOpt = "x86_64-v3" (see modules/desktop-common.nix — also trims ~80 legacy subsystems).
    • muffin: linuxPackages_6_12 (pinned; 6.18 has a ZFS deadlock in dbuf_evict).
  • Domain: sigkill.computer. The old gardling.com redirects automatically.

Agent-specific instructions

  • If instructed to commit, disable GPG signing (git commit --no-gpg-sign). The author's GPG key is not available in this environment.
  • Use nix-shell -p <package> if a tool is missing from the environment.
  • For nix build, always append -L for verbose logs.
  • If Nix reports a missing file, run git add <file> first — flakes only see git-tracked files.
  • Do not read files under secrets/.
  • Run nix fmt after editing any .nix file.
  • Validate every change with nix build .#nixosConfigurations.<host>.config.system.build.toplevel -L.
  • Commit messages are terse, lowercase; prefix with <scope>: when narrowly scoped (caddy: add redirect, zfs: remove unneeded options, mreow: bump kernel). Generic changes use update or a short description.