15 KiB
AGENTS.md
Project Overview
Unified NixOS flake for three hosts:
| Host | Role | nixpkgs channel | Activation |
|---|---|---|---|
mreow |
Framework 13 AMD AI 300 laptop (niri, greetd, swaylock) | nixos-unstable |
./deploy.sh locally |
yarn |
AMD Zen 5 desktop (niri + Jovian-NixOS Steam deck mode, impermanence) | nixos-unstable |
pull from CI binary cache |
muffin |
AMD Zen 3 server (Caddy, ZFS, agenix, deploy-rs, 25+ services) | nixos-25.11 |
deploy-rs from CI |
One flake.nix declares both channels (nixpkgs and nixpkgs-stable) and composes each host from the correct channel. No single-channel migration is intended.
History pre-dating this repo lives in the merged subtree branches from dotfiles (commit e9a44f6) and server-config (commit 4bc5d57). Use git log <path> (without --follow) and traverse back through the merge commits dc481c2 and 6448a04 for pre-unify history.
Layout
flake.nix # 3 hosts, 2 channels
deploy.sh # wrapper: current-host rebuild or `muffin` deploy-rs
hosts/<host>/ # host entrypoints (default.nix, home.nix, disk.nix, …)
modules/ # flat namespace; see module naming below
common.nix # imported by ALL hosts (nix settings, doas, fish shim)
desktop-*.nix # imported by mreow/yarn only
server-*.nix # imported by muffin only
<bare>.nix # scoped by filename (age-secrets, zfs, no-rgb, …)
home/
profiles/{gui,desktop,no-gui}.nix # home-manager profiles
progs/<program>.nix # one file per program (fish, helix, niri, zen/, emacs, …)
util/<helper>.nix # small derivations
services/ # muffin-only: caddy, jellyfin, gitea, matrix, monero, …
tests/ # pkgs.testers.runNixOSTest suite
lib/
default.nix # extends nixpkgs-stable.lib with mkCaddyReverseProxy, serviceMountWithZpool, …
overlays.nix # jellyfin-exporter, igpu-exporter, reflac, ensureZfsMounts
patches/nixpkgs/ # applied to nixpkgs-stable for muffin builds
secrets/
desktop/ # git-crypt: mreow + yarn share these (wifi, nix-cache-netrc, secureboot.tar, password-hash, disk-password)
home/ # git-crypt: per-user HM secrets (api keys, steam id)
server/ # agenix *.age + git-crypt *.nix/*.tar/livekit_keys
usb-secrets/ # USB-resident agenix identity key (git-crypt inside the repo)
Never read or write files under secrets/. They are encrypted at rest (git-crypt for plaintext, agenix for .age). The git-crypt key is delivered to muffin at runtime as /run/agenix/git-crypt-key-nixos.age.
Build & Deploy
# --- from any host ---
nix fmt # nixfmt-tree
nix flake update # bump both channels + inputs
nix flake update --input-name nixpkgs # bump just desktops' channel
nix flake update --input-name nixpkgs-stable # bump just muffin's channel
# --- per-host eval / build (add -L for verbose logs) ---
nix build .#nixosConfigurations.mreow.config.system.build.toplevel -L
nix build .#nixosConfigurations.yarn.config.system.build.toplevel -L
nix build .#nixosConfigurations.muffin.config.system.build.toplevel -L
# --- quick eval without building ---
nix eval .#nixosConfigurations.muffin.config.system.build.toplevel --no-build 2>&1 | head -5
# --- activate on current host (mreow / yarn only) ---
./deploy.sh # boot (default; next reboot)
./deploy.sh switch # apply immediately
./deploy.sh test # apply without boot entry
./deploy.sh build # build only
# --- deploy to muffin from anywhere ---
./deploy.sh muffin
# equivalent to:
nix run .#deploy -- .#muffin
# --- tests (muffin) ---
nix build .#packages.x86_64-linux.tests -L # all tests (slow)
nix build .#test-zfsTest -L # one test by name
# test names are the keys of tests/tests.nix; pattern is test-<name>
No unit tests for desktop configs. Validation is the nix build exit code plus the successful nix-diff against the previous generation.
If Nix complains about a missing file, git add it first — flakes only see tracked files.
Module naming
| Prefix | Meaning | Example |
|---|---|---|
common- |
imported by ALL hosts | common-doas.nix, common-nix.nix, common-shell-fish.nix |
desktop- |
imported by mreow + yarn only | desktop-common.nix, desktop-steam.nix, desktop-networkmanager.nix |
server- |
imported by muffin only | server-security.nix, server-power.nix, server-impermanence.nix, server-lanzaboote-agenix.nix |
| (none) | host-specific filename-scoped; see file contents | age-secrets.nix, zfs.nix, no-rgb.nix (yarn + muffin) |
New modules: pick the narrowest prefix that's true, then add the import explicitly in the host's default.nix (there is no auto-discovery).
Code style
- Formatter:
nixfmt-tree(declared inflake.nix). Runnix fmtbefore every commit. - Indentation: 2 spaces, enforced by the formatter.
- Function args: one per line, trailing comma, always end with
...:{ config, lib, pkgs, username, ... }: - Imports: relative paths, one per line. Use the
../../modules/style fromhosts/; do not invent new aggregator modules unless more than one host uses the aggregation. - Package paths:
lib.getExe pkgs.fooover"${pkgs.foo}/bin/foo"when the derivation declaresmeta.mainProgram. - Unfree packages: allowlisted per-module via
nixpkgs.config.allowUnfreePredicate. Do not add a global permit. - Comments: lowercase,
#style. Use# TODO!/# BUG!/# FIX:prefixes for known issues that should be searchable. - No trailing commas (Nix syntax forbids them).
lib.mkDefault/lib.mkForce: prefermkDefaultin shared modules so hosts can override without fighting priority; usemkForceonly to beat inherited defaults you can't reach any other way.
Secrets
- git-crypt covers
secrets/**per the root.gitattributes. Initialized with a single symmetric key checked intosecrets/server/git-crypt-key-nixos.age(agenix-encrypted to the USB SSH identity). - agenix decrypts
secrets/server/*.ageat activation into/run/agenix/on muffin. - USB identity:
/mnt/usb-secrets/usb-secrets-keyon muffin; the age identity path is wired inmodules/usb-secrets.nix. - Encrypting a new agenix secret uses the SSH public key directly with
age -R:age -R <(ssh-keygen -y -f secrets/usb-secrets/usb-secrets-key) \ -o secrets/server/<name>.age \ /path/to/plaintext - DO NOT use
ssh-to-age. It producesX25519recipient stanzas, which the SSH private key on muffin cannot decrypt (it only decryptsssh-ed25519stanzas produced byage -Ragainst the SSH pubkey). Mismatched stanzas show up asage: error: no identity matched any of the recipientsat deploy time. - Never read or commit plaintext secrets. Never log secret values.
Service pattern (muffin)
Each file under services/ follows this shape:
importsblock withlib.serviceMountWithZpooland (optionally)lib.serviceFilePerms.- The service configuration (
services.<name> = { … }). - Caddy reverse-proxy vhost (usually via
lib.mkCaddyReverseProxyinlib/default.nix). - Firewall rules (
networking.firewall.allowed{TCP,UDP}Ports) if externally reachable. services.fail2ban.jails.<name>if the service authenticates users.
Custom lib helpers (in lib/default.nix) to prefer over reinventing:
lib.serviceMountWithZpool <service> <zpool> [dirs]lib.serviceFilePerms <service> [tmpfilesRules]lib.optimizePackage <pkg>— applies-O3 -march=znver3 -mtune=znver3lib.vpnNamespaceOpenPort <port> <service>— confines service to the WireGuard namespacelib.mkCaddyReverseProxy { subdomain|domain, port, auth ? false, vpn ? false }lib.mkFail2banJail { name, unitName ? "${name}.service", failregex }lib.mkGrafanaAnnotationService { name, description, script, after ? [], environment ? {}, loadCredential ? null }lib.extractArrApiKey <configXmlPath>— shell snippet to read the<ApiKey>element
Hard requirements that are asserted at eval time:
- Port uniqueness: every port in
hosts/muffin/service-configs.nixports.{public,private}must be unique. The flake asserts this. - Public/private segregation: public ports must appear in the firewall allow-list; private ports must not. The flake asserts both directions.
- Hugepages: services that need 2 MiB hugepages declare their budget in
service-configs.nixunderhugepages_2m.services. Thevm.nr_hugepagessysctl is derived from the total. - PostgreSQL-first: any service that supports PostgreSQL uses it (via peer-auth Unix socket when possible). Per-service Sqlite (or similar) is not liked.
Deploy guard (muffin)
modules/server-deploy-guard.nix aggregates per-service "is anyone using this right now?" checks into a single deploy-guard-check binary on muffin. Enforcement is preflight-only — the guard runs over SSH before deploy-rs is invoked; activation itself is never gated. This matters because deploy-rs sets the new profile pointer before running the activation script, so a failed activation triggers auto-rollback which re-runs switch-to-configuration on the previous generation — that re-activation rotates agenix secrets, reinstalls lanzaboote, and reloads systemd units. The only safe place to stop a deploy is before deploy-rs starts.
Two drivers invoke the preflight:
./deploy.sh muffinSSHes toserver-publicand runsdeploy-guard-check. SSH connection failure is a hard abort (rc=255) because there is no second gate../deploy.sh muffin --force(orDEPLOY_GUARD_FORCE=1 ./deploy.sh muffin) skips the preflight entirely.- CI (
.gitea/workflows/deploy.yml) has aDeploy guard preflightstep betweenBuild muffinandDeploy via deploy-rs. A non-zero exit fails the job before any closure copy or activation.
Adding a new check
In the service's own file (or a sibling <service>-deploy-guard.nix):
{ config, lib, pkgs, ... }:
let
check = pkgs.writeShellApplication {
name = "deploy-guard-check-<service>";
runtimeInputs = [ /* curl, jq, etc. */ ];
text = ''
# exit 0 when the service is idle / unreachable (soft-fail)
# exit 1 with a reason on stdout/stderr when live users would be disrupted
'';
};
in
lib.mkIf config.services.<service>.enable {
services.deployGuard.checks.<service> = {
description = "Active <service> users";
command = check;
};
}
Existing registrations live in services/jellyfin/jellyfin-deploy-guard.nix (REST /Sessions via curl+jq) and services/minecraft-deploy-guard.nix (Server List Ping via mcstatus). Prefer soft-fail on unreachable — a service that's already down has no users to disrupt.
Deploy finalize (muffin)
modules/server-deploy-finalize.nix solves the self-deploy problem: the gitea-actions runner driving CI deploys lives on muffin itself, so a direct switch-to-configuration switch restarts the runner mid-activation, killing the SSH session, the CI job, and deploy-rs's magic-rollback handshake. The failure mode is visible as "deploy appears to fail even though the new config landed" (or worse, a rollback storm).
The fix is a two-phase activation wired into deploy.nodes.muffin.profiles.system.path in flake.nix:
switch-to-configuration boot— bootloader-only, no service restarts. The runner, SSH session, and magic-rollback survive.deploy-finalize— schedules a detachedsystemd-run --on-active=Ntransient unit (default 60s). The unit is owned by pid1, so it survives the eventual runner restart. If/run/booted-system/{kernel,initrd,kernel-modules}differs from the new profile's, the unit runssystemctl reboot; otherwise it runsswitch-to-configuration switch.
That is, reboot is dynamically gated on kernel/initrd/kernel-modules change. The 60s delay is tuned so the CI job (or manual ./deploy.sh muffin) has time to emit status/notification steps before the runner is recycled.
Back-to-back deploys supersede each other: each invocation cancels any still-pending deploy-finalize-*.timer before scheduling its own. deploy-finalize --dry-run prints the decision without scheduling anything — useful when debugging.
Prior art: the 3-path {kernel,initrd,kernel-modules} diff is lifted from nixpkgs's system.autoUpgrade module (the allowReboot = true branch) and was packaged the same way in obsidiansystems/obelisk#957. nixpkgs#185030 tracks lifting it into switch-to-configuration proper but has been stale since 2025-07. The self-deploy systemd-run detachment is the proposed fix from deploy-rs#153, also unmerged upstream.
Technical details
- Privilege escalation:
doaseverywhere;sudois disabled on every host. - Shell: fish.
bashlogin shells re-exec into fish viaprograms.bash.interactiveShellInit(seemodules/common-shell-fish.nix). - Secure boot: lanzaboote. Desktops extract keys from
secrets/desktop/secureboot.tar; muffin extracts from an agenix-decrypted tar (seemodules/server-lanzaboote-agenix.nix). - Impermanence: muffin is tmpfs-root with
/persistentsurviving reboots (modules/server-impermanence.nix); yarn binds/home/primaryfrom/persistent(hosts/yarn/impermanence.nix). - Disks: disko.
- Binary cache: muffin runs harmonia; desktops consume it at
https://nix-cache.sigkill.computer. - Kernel:
- Desktops:
linux-cachyos-bore-lto,processorOpt = "x86_64-v3"(seemodules/desktop-common.nix— also trims ~80 legacy subsystems). - muffin:
linuxPackages_6_12(pinned; 6.18 has a ZFS deadlock indbuf_evict).
- Desktops:
- Domain:
sigkill.computer. The oldgardling.comredirects automatically.
Agent-specific instructions
- If instructed to commit, disable GPG signing (
git commit --no-gpg-sign). The author's GPG key is not available in this environment. - Use
nix-shell -p <package>if a tool is missing from the environment. - For
nix build, always append-Lfor verbose logs. - If Nix reports a missing file, run
git add <file>first — flakes only see git-tracked files. - Do not read files under
secrets/. - Run
nix fmtafter editing any.nixfile. - Validate every change with
nix build .#nixosConfigurations.<host>.config.system.build.toplevel -L. - Commit messages are terse, lowercase; prefix with
<scope>:when narrowly scoped (caddy: add redirect,zfs: remove unneeded options,mreow: bump kernel). Generic changes useupdateor a short description.