deploy: potentially fix self-deploy issue?
This commit is contained in:
15
AGENTS.md
15
AGENTS.md
@@ -191,6 +191,21 @@ lib.mkIf config.services.<service>.enable {
|
||||
|
||||
Existing registrations live in `services/jellyfin/jellyfin-deploy-guard.nix` (REST `/Sessions` via curl+jq) and `services/minecraft-deploy-guard.nix` (Server List Ping via `mcstatus`). Prefer soft-fail on unreachable — a service that's already down has no users to disrupt.
|
||||
|
||||
## Deploy finalize (muffin)
|
||||
|
||||
`modules/server-deploy-finalize.nix` solves the self-deploy problem: the gitea-actions runner driving CI deploys lives on muffin itself, so a direct `switch-to-configuration switch` restarts the runner mid-activation, killing the SSH session, the CI job, and deploy-rs's magic-rollback handshake. The failure mode is visible as "deploy appears to fail even though the new config landed" (or worse, a rollback storm).
|
||||
|
||||
The fix is a two-phase activation wired into `deploy.nodes.muffin.profiles.system.path` in `flake.nix`:
|
||||
|
||||
1. `switch-to-configuration boot` — bootloader-only, no service restarts. The runner, SSH session, and magic-rollback survive.
|
||||
2. `deploy-finalize` — schedules a detached `systemd-run --on-active=N` transient unit (default 60s). The unit is owned by pid1, so it survives the eventual runner restart. If `/run/booted-system/{kernel,initrd,kernel-modules}` differs from the new profile's, the unit runs `systemctl reboot`; otherwise it runs `switch-to-configuration switch`.
|
||||
|
||||
That is, reboot is dynamically gated on kernel/initrd/kernel-modules change. The 60s delay is tuned so the CI job (or manual `./deploy.sh muffin`) has time to emit status/notification steps before the runner is recycled.
|
||||
|
||||
Back-to-back deploys supersede each other: each invocation cancels any still-pending `deploy-finalize-*.timer` before scheduling its own. `deploy-finalize --dry-run` prints the decision without scheduling anything — useful when debugging.
|
||||
|
||||
Prior art: the 3-path `{kernel,initrd,kernel-modules}` diff is lifted from nixpkgs's `system.autoUpgrade` module (the `allowReboot = true` branch) and was packaged the same way in [obsidiansystems/obelisk#957](https://github.com/obsidiansystems/obelisk/pull/957). nixpkgs#185030 tracks lifting it into `switch-to-configuration` proper but has been stale since 2025-07. The self-deploy `systemd-run` detachment is the proposed fix from [deploy-rs#153](https://github.com/serokell/deploy-rs/issues/153), also unmerged upstream.
|
||||
|
||||
## Technical details
|
||||
|
||||
- **Privilege escalation**: `doas` everywhere; `sudo` is disabled on every host.
|
||||
|
||||
Reference in New Issue
Block a user