237 lines
16 KiB
Markdown
237 lines
16 KiB
Markdown
# AGENTS.md
|
|
|
|
## Project Overview
|
|
|
|
Unified NixOS flake for three hosts:
|
|
|
|
| Host | Role | nixpkgs channel | Activation |
|
|
|------|------|----------------|-----------|
|
|
| `mreow` | Framework 13 AMD AI 300 laptop (niri, greetd, swaylock) | `nixos-unstable` | `./deploy.sh` locally |
|
|
| `yarn` | AMD Zen 5 desktop (niri + Jovian-NixOS Steam deck mode, impermanence) | `nixos-unstable` | pull from CI binary cache |
|
|
| `muffin` | AMD Zen 3 server (Caddy, ZFS, agenix, deploy-rs, 25+ services) | `nixos-25.11` | deploy-rs from CI |
|
|
|
|
One `flake.nix` declares both channels (`nixpkgs` and `nixpkgs-stable`) and composes each host from the correct channel. No single-channel migration is intended.
|
|
|
|
History pre-dating this repo lives in the merged subtree branches from `dotfiles` (commit `e9a44f6`) and `server-config` (commit `4bc5d57`). Use `git log <path>` (without `--follow`) and traverse back through the merge commits `dc481c2` and `6448a04` for pre-unify history.
|
|
|
|
## Layout
|
|
|
|
```
|
|
flake.nix # 3 hosts, 2 channels
|
|
deploy.sh # wrapper: current-host rebuild or `muffin` deploy-rs
|
|
hosts/<host>/ # host entrypoints (default.nix, home.nix, disk.nix, …)
|
|
modules/ # flat namespace; see module naming below
|
|
common.nix # imported by ALL hosts (nix settings, doas, fish shim)
|
|
desktop-*.nix # imported by mreow/yarn only
|
|
server-*.nix # imported by muffin only
|
|
<bare>.nix # scoped by filename (age-secrets, zfs, no-rgb, …)
|
|
home/
|
|
profiles/{gui,desktop,no-gui}.nix # home-manager profiles
|
|
progs/<program>.nix # one file per program (fish, helix, niri, zen/, emacs, …)
|
|
util/<helper>.nix # small derivations
|
|
services/ # muffin-only: caddy, jellyfin, gitea, matrix, monero, …
|
|
tests/ # pkgs.testers.runNixOSTest suite
|
|
lib/
|
|
default.nix # extends nixpkgs-stable.lib with mkCaddyReverseProxy, serviceMountWithZpool, …
|
|
overlays.nix # jellyfin-exporter, igpu-exporter, reflac, ensureZfsMounts
|
|
patches/nixpkgs/ # applied to nixpkgs-stable for muffin builds
|
|
secrets/
|
|
secrets.nix # agenix recipients (who can decrypt each .age)
|
|
desktop/ # agenix *.age (mreow + yarn) + disk-password (install-time only, git-crypt)
|
|
home/ # git-crypt: per-user HM secrets (api keys, steam id)
|
|
server/ # agenix *.age + git-crypt *.nix/*.tar/livekit_keys (muffin)
|
|
usb-secrets/ # USB-resident agenix identity for muffin (git-crypt inside the repo)
|
|
```
|
|
|
|
**Never read or write files under `secrets/`.** They are encrypted at rest (git-crypt for plaintext, agenix for `.age`). The git-crypt key is delivered to `muffin` at runtime as `/run/agenix/git-crypt-key-nixos.age`.
|
|
|
|
## Build & Deploy
|
|
|
|
```sh
|
|
# --- from any host ---
|
|
nix fmt # nixfmt-tree
|
|
nix flake update # bump both channels + inputs
|
|
nix flake update --input-name nixpkgs # bump just desktops' channel
|
|
nix flake update --input-name nixpkgs-stable # bump just muffin's channel
|
|
|
|
# --- per-host eval / build (add -L for verbose logs) ---
|
|
nix build .#nixosConfigurations.mreow.config.system.build.toplevel -L
|
|
nix build .#nixosConfigurations.yarn.config.system.build.toplevel -L
|
|
nix build .#nixosConfigurations.muffin.config.system.build.toplevel -L
|
|
|
|
# --- quick eval without building ---
|
|
nix eval .#nixosConfigurations.muffin.config.system.build.toplevel --no-build 2>&1 | head -5
|
|
|
|
# --- activate on current host (mreow / yarn only) ---
|
|
./deploy.sh # boot (default; next reboot)
|
|
./deploy.sh switch # apply immediately
|
|
./deploy.sh test # apply without boot entry
|
|
./deploy.sh build # build only
|
|
|
|
# --- deploy to muffin from anywhere ---
|
|
./deploy.sh muffin
|
|
# equivalent to:
|
|
nix run .#deploy -- .#muffin
|
|
|
|
# --- tests (muffin) ---
|
|
nix build .#packages.x86_64-linux.tests -L # all tests (slow)
|
|
nix build .#test-zfsTest -L # one test by name
|
|
# test names are the keys of tests/tests.nix; pattern is test-<name>
|
|
```
|
|
|
|
No unit tests for desktop configs. Validation is the `nix build` exit code plus the successful `nix-diff` against the previous generation.
|
|
|
|
If Nix complains about a missing file, `git add` it first — flakes only see tracked files.
|
|
|
|
## Module naming
|
|
|
|
| Prefix | Meaning | Example |
|
|
|--------|---------|---------|
|
|
| `common-` | imported by ALL hosts | `common-doas.nix`, `common-nix.nix`, `common-shell-fish.nix` |
|
|
| `desktop-` | imported by mreow + yarn only | `desktop-common.nix`, `desktop-steam.nix`, `desktop-networkmanager.nix` |
|
|
| `server-` | imported by muffin only | `server-security.nix`, `server-power.nix`, `server-impermanence.nix`, `server-lanzaboote-agenix.nix` |
|
|
| *(none)* | host-specific filename-scoped; see file contents | `zfs.nix`, `no-rgb.nix` (yarn + muffin) |
|
|
|
|
New modules: pick the narrowest prefix that's true, then add the import explicitly in the host's `default.nix` (there is no auto-discovery).
|
|
|
|
## Code style
|
|
|
|
- **Formatter**: `nixfmt-tree` (declared in `flake.nix`). Run `nix fmt` before every commit.
|
|
- **Indentation**: 2 spaces, enforced by the formatter.
|
|
- **Function args**: one per line, trailing comma, always end with `...`:
|
|
```nix
|
|
{
|
|
config,
|
|
lib,
|
|
pkgs,
|
|
username,
|
|
...
|
|
}:
|
|
```
|
|
- **Imports**: relative paths, one per line. Use the `../../modules/` style from `hosts/`; do not invent new aggregator modules unless more than one host uses the aggregation.
|
|
- **Package paths**: `lib.getExe pkgs.foo` over `"${pkgs.foo}/bin/foo"` when the derivation declares `meta.mainProgram`.
|
|
- **Unfree packages**: allowlisted per-module via `nixpkgs.config.allowUnfreePredicate`. Do not add a global permit.
|
|
- **Comments**: lowercase, `#` style. Use `# TODO!` / `# BUG!` / `# FIX:` prefixes for known issues that should be searchable.
|
|
- **No trailing commas** (Nix syntax forbids them).
|
|
- **`lib.mkDefault` / `lib.mkForce`**: prefer `mkDefault` in shared modules so hosts can override without fighting priority; use `mkForce` only to beat inherited defaults you can't reach any other way.
|
|
|
|
## Secrets
|
|
|
|
- **git-crypt** covers `secrets/**` per the root `.gitattributes`. Initialized with a single symmetric key checked into `secrets/server/git-crypt-key-nixos.age` (agenix-encrypted to the USB SSH identity).
|
|
- **agenix** decrypts `*.age` into `/run/agenix/` at activation on every host:
|
|
- **muffin**: identity is `/mnt/usb-secrets/usb-secrets-key` (ssh-ed25519 on a physical USB). Wired in `modules/usb-secrets.nix`.
|
|
- **mreow + yarn**: identity is `/var/lib/agenix/tpm-identity` (an `age-plugin-tpm` handle sealed by the host's TPM 2.0). Wired in `modules/desktop-age-secrets.nix`; yarn persists `/var/lib/agenix` through impermanence.
|
|
- **Recipients** are declared in `secrets/secrets.nix`. Desktop secrets are encrypted to the admin SSH key + each host's TPM recipient; server secrets stay encrypted to the muffin USB key.
|
|
- **Bootstrap a new desktop**: run `doas scripts/bootstrap-desktop-tpm.sh` on the host. It generates a TPM-sealed identity at `/var/lib/agenix/tpm-identity` and prints an `age1tpm1…` recipient. Append it to the `tpm` list in `secrets/secrets.nix`, run `agenix -r` to re-encrypt, commit, `./deploy.sh switch`.
|
|
- **Encrypting a new server secret** uses the SSH public key directly with `age -R`:
|
|
```sh
|
|
age -R <(ssh-keygen -y -f secrets/usb-secrets/usb-secrets-key) \
|
|
-o secrets/server/<name>.age \
|
|
/path/to/plaintext
|
|
```
|
|
For desktop secrets, prefer `agenix -e secrets/desktop/<name>.age` from a shell with `age-plugin-tpm` on PATH — it reads `secrets/secrets.nix` and encrypts to every recipient listed there.
|
|
- **DO NOT use `ssh-to-age`**. It produces `X25519` recipient stanzas, which the SSH private key on muffin cannot decrypt (it only decrypts `ssh-ed25519` stanzas produced by `age -R` against the SSH pubkey). Mismatched stanzas show up as `age: error: no identity matched any of the recipients` at deploy time.
|
|
- Never read or commit plaintext secrets. Never log secret values.
|
|
|
|
## Service pattern (muffin)
|
|
|
|
Each file under `services/` follows this shape:
|
|
|
|
1. `imports` block with `lib.serviceMountWithZpool` and (optionally) `lib.serviceFilePerms`.
|
|
2. The service configuration (`services.<name> = { … }`).
|
|
3. Caddy reverse-proxy vhost (usually via `lib.mkCaddyReverseProxy` in `lib/default.nix`).
|
|
4. Firewall rules (`networking.firewall.allowed{TCP,UDP}Ports`) if externally reachable.
|
|
5. `services.fail2ban.jails.<name>` if the service authenticates users.
|
|
|
|
Custom lib helpers (in `lib/default.nix`) to prefer over reinventing:
|
|
|
|
- `lib.serviceMountWithZpool <service> <zpool> [dirs]`
|
|
- `lib.serviceFilePerms <service> [tmpfilesRules]`
|
|
- `lib.optimizePackage <pkg>` — applies `-O3 -march=znver3 -mtune=znver3`
|
|
- `lib.vpnNamespaceOpenPort <port> <service>` — confines service to the WireGuard namespace
|
|
- `lib.mkCaddyReverseProxy { subdomain|domain, port, auth ? false, vpn ? false }`
|
|
- `lib.mkFail2banJail { name, unitName ? "${name}.service", failregex }`
|
|
- `lib.mkGrafanaAnnotationService { name, description, script, after ? [], environment ? {}, loadCredential ? null }`
|
|
- `lib.extractArrApiKey <configXmlPath>` — shell snippet to read the `<ApiKey>` element
|
|
|
|
Hard requirements that are asserted at eval time:
|
|
|
|
- **Port uniqueness**: every port in `hosts/muffin/service-configs.nix` `ports.{public,private}` must be unique. The flake asserts this.
|
|
- **Public/private segregation**: public ports must appear in the firewall allow-list; private ports must not. The flake asserts both directions.
|
|
- **Hugepages**: services that need 2 MiB hugepages declare their budget in `service-configs.nix` under `hugepages_2m.services`. The `vm.nr_hugepages` sysctl is derived from the total.
|
|
- **PostgreSQL-first**: any service that supports PostgreSQL uses it (via peer-auth Unix socket when possible). Per-service Sqlite (or similar) is not liked.
|
|
|
|
## Deploy guard (muffin)
|
|
|
|
`modules/server-deploy-guard.nix` aggregates per-service "is anyone using this right now?" checks into a single `deploy-guard-check` binary on muffin. Enforcement is **preflight-only** — the guard runs over SSH *before* deploy-rs is invoked; activation itself is never gated. This matters because deploy-rs sets the new profile pointer before running the activation script, so a failed activation triggers auto-rollback which re-runs `switch-to-configuration` on the previous generation — that re-activation rotates agenix secrets, reinstalls lanzaboote, and reloads systemd units. The only safe place to stop a deploy is before deploy-rs starts.
|
|
|
|
Two drivers invoke the preflight:
|
|
|
|
- **`./deploy.sh muffin`** SSHes to `server-public` and runs `deploy-guard-check`. SSH connection failure is a hard abort (rc=255) because there is no second gate. `./deploy.sh muffin --force` (or `DEPLOY_GUARD_FORCE=1 ./deploy.sh muffin`) skips the preflight entirely.
|
|
- **CI (`.gitea/workflows/deploy.yml`)** has a `Deploy guard preflight` step between `Build muffin` and `Deploy via deploy-rs`. A non-zero exit fails the job before any closure copy or activation.
|
|
|
|
### Adding a new check
|
|
|
|
In the service's own file (or a sibling `<service>-deploy-guard.nix`):
|
|
|
|
```nix
|
|
{ config, lib, pkgs, ... }:
|
|
let
|
|
check = pkgs.writeShellApplication {
|
|
name = "deploy-guard-check-<service>";
|
|
runtimeInputs = [ /* curl, jq, etc. */ ];
|
|
text = ''
|
|
# exit 0 when the service is idle / unreachable (soft-fail)
|
|
# exit 1 with a reason on stdout/stderr when live users would be disrupted
|
|
'';
|
|
};
|
|
in
|
|
lib.mkIf config.services.<service>.enable {
|
|
services.deployGuard.checks.<service> = {
|
|
description = "Active <service> users";
|
|
command = check;
|
|
};
|
|
}
|
|
```
|
|
|
|
Existing registrations live in `services/jellyfin/jellyfin-deploy-guard.nix` (REST `/Sessions` via curl+jq) and `services/minecraft-deploy-guard.nix` (Server List Ping via `mcstatus`). Prefer soft-fail on unreachable — a service that's already down has no users to disrupt.
|
|
|
|
## Deploy finalize (muffin)
|
|
|
|
`modules/server-deploy-finalize.nix` solves the self-deploy problem: the gitea-actions runner driving CI deploys lives on muffin itself, so a direct `switch-to-configuration switch` restarts the runner mid-activation, killing the SSH session, the CI job, and deploy-rs's magic-rollback handshake. The failure mode is visible as "deploy appears to fail even though the new config landed" (or worse, a rollback storm).
|
|
|
|
The fix is a two-phase activation wired into `deploy.nodes.muffin.profiles.system.path` in `flake.nix`:
|
|
|
|
1. `switch-to-configuration boot` — bootloader-only, no service restarts. The runner, SSH session, and magic-rollback survive.
|
|
2. `deploy-finalize` — schedules a detached `systemd-run --on-active=N` transient unit (default 60s). The unit is owned by pid1, so it survives the eventual runner restart. If `/run/booted-system/{kernel,initrd,kernel-modules}` differs from the new profile's, the unit runs `systemctl reboot`; otherwise it runs `switch-to-configuration switch`.
|
|
|
|
That is, reboot is dynamically gated on kernel/initrd/kernel-modules change. The 60s delay is tuned so the CI job (or manual `./deploy.sh muffin`) has time to emit status/notification steps before the runner is recycled.
|
|
|
|
Back-to-back deploys supersede each other: each invocation cancels any still-pending `deploy-finalize-*.timer` before scheduling its own. `deploy-finalize --dry-run` prints the decision without scheduling anything — useful when debugging.
|
|
|
|
Prior art: the 3-path `{kernel,initrd,kernel-modules}` diff is lifted from nixpkgs's `system.autoUpgrade` module (the `allowReboot = true` branch) and was packaged the same way in [obsidiansystems/obelisk#957](https://github.com/obsidiansystems/obelisk/pull/957). nixpkgs#185030 tracks lifting it into `switch-to-configuration` proper but has been stale since 2025-07. The self-deploy `systemd-run` detachment is the proposed fix from [deploy-rs#153](https://github.com/serokell/deploy-rs/issues/153), also unmerged upstream.
|
|
|
|
## Technical details
|
|
|
|
- **Privilege escalation**: `doas` everywhere; `sudo` is disabled on every host.
|
|
- **Shell**: fish. `bash` login shells re-exec into fish via `programs.bash.interactiveShellInit` (see `modules/common-shell-fish.nix`).
|
|
- **Secure boot**: lanzaboote. Every host extracts keys from an agenix-decrypted tar at activation — desktops via `modules/desktop-lanzaboote-agenix.nix`, muffin via `modules/server-lanzaboote-agenix.nix`.
|
|
- **Impermanence**: muffin is tmpfs-root with `/persistent` surviving reboots (`modules/server-impermanence.nix`); yarn binds `/home/primary` from `/persistent` (`hosts/yarn/impermanence.nix`).
|
|
- **Disks**: disko.
|
|
- **Binary cache**: muffin runs harmonia; desktops consume it at `https://nix-cache.sigkill.computer`.
|
|
- **Kernel**:
|
|
- Desktops: `linux-cachyos-bore-lto`, `processorOpt = "x86_64-v3"` (see `modules/desktop-common.nix` — also trims ~80 legacy subsystems).
|
|
- muffin: `linuxPackages_6_12` (pinned; 6.18 has a ZFS deadlock in `dbuf_evict`).
|
|
- **Domain**: `sigkill.computer`. The old `gardling.com` redirects automatically.
|
|
|
|
## Agent-specific instructions
|
|
|
|
- If instructed to commit, **disable GPG signing** (`git commit --no-gpg-sign`). The author's GPG key is not available in this environment.
|
|
- Use `nix-shell -p <package>` if a tool is missing from the environment.
|
|
- For `nix build`, always append `-L` for verbose logs.
|
|
- If Nix reports a missing file, run `git add <file>` first — flakes only see git-tracked files.
|
|
- Do not read files under `secrets/`.
|
|
- Run `nix fmt` after editing any `.nix` file.
|
|
- Validate every change with `nix build .#nixosConfigurations.<host>.config.system.build.toplevel -L`.
|
|
- Commit messages are terse, lowercase; prefix with `<scope>:` when narrowly scoped (`caddy: add redirect`, `zfs: remove unneeded options`, `mreow: bump kernel`). Generic changes use `update` or a short description.
|