deploy-guard: block activation while users are online
- modules/server-deploy-guard.nix: extendable aggregator registered via
services.deployGuard.checks.<name>.{description,command}. Installs
deploy-guard-check with per-check timeout, pass/block reporting, JSON
output, DEPLOY_GUARD_BYPASS / /run/deploy-guard-bypass (single-shot).
- services/jellyfin/jellyfin-deploy-guard.nix: curl+jq on /Sessions,
blocks when any session carries NowPlayingItem; soft-fails when unreachable.
- services/minecraft-deploy-guard.nix: mcstatus SLP query on 25565, blocks
when players.online > 0; soft-fails when unreachable.
- flake.nix: wrap deploy.nodes.muffin activation with activate.custom so
deploy-guard-check runs before switch-to-configuration. Auto-rollback
catches the failure. dryActivate/boot branches preserved.
- deploy.sh: SSH preflight for ./deploy.sh muffin with --force /
DEPLOY_GUARD_FORCE=1 (touches remote bypass marker). Connectivity
failure is soft; activation still enforces.
- tests/deploy-guard.nix: aggregator contract, bypass mechanics, timeout,
JSON output.
This commit is contained in:
33
AGENTS.md
33
AGENTS.md
@@ -156,6 +156,39 @@ Hard requirements that are asserted at eval time:
|
||||
- **Hugepages**: services that need 2 MiB hugepages declare their budget in `service-configs.nix` under `hugepages_2m.services`. The `vm.nr_hugepages` sysctl is derived from the total.
|
||||
- **PostgreSQL-first**: any service that supports PostgreSQL uses it (via peer-auth Unix socket when possible). Per-service Sqlite (or similar) is not liked.
|
||||
|
||||
## Deploy guard (muffin)
|
||||
|
||||
`modules/server-deploy-guard.nix` blocks `./deploy.sh muffin` / deploy-rs activation when a service it covers is in active use. Two paths enforce it:
|
||||
|
||||
- **Preflight**: `./deploy.sh muffin` SSHes to `server-public` and runs `deploy-guard-check` before the build. Connectivity failure is soft (activation still enforces). `./deploy.sh muffin --force` or `DEPLOY_GUARD_FORCE=1 ./deploy.sh muffin` touches `/run/deploy-guard-bypass` remotely (single-shot) and skips the preflight.
|
||||
- **Activation**: the custom `activate.custom` wrapper in `flake.nix` runs `$PROFILE/sw/bin/deploy-guard-check` before `switch-to-configuration switch`. A non-zero exit triggers deploy-rs auto-rollback. Same bypass: `DEPLOY_GUARD_BYPASS=1` env or pre-touched `/run/deploy-guard-bypass`.
|
||||
|
||||
### Adding a new check
|
||||
|
||||
In the service's own file (or a sibling `<service>-deploy-guard.nix`):
|
||||
|
||||
```nix
|
||||
{ config, lib, pkgs, ... }:
|
||||
let
|
||||
check = pkgs.writeShellApplication {
|
||||
name = "deploy-guard-check-<service>";
|
||||
runtimeInputs = [ /* curl, jq, etc. */ ];
|
||||
text = ''
|
||||
# exit 0 when the service is idle / unreachable (soft-fail)
|
||||
# exit 1 with a reason on stdout/stderr when live users would be disrupted
|
||||
'';
|
||||
};
|
||||
in
|
||||
lib.mkIf config.services.<service>.enable {
|
||||
services.deployGuard.checks.<service> = {
|
||||
description = "Active <service> users";
|
||||
command = check;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
Existing registrations live in `services/jellyfin/jellyfin-deploy-guard.nix` (REST `/Sessions` via curl+jq) and `services/minecraft-deploy-guard.nix` (Server List Ping via `mcstatus`). Prefer soft-fail on unreachable — a service that's already down has no users to disrupt.
|
||||
|
||||
## Technical details
|
||||
|
||||
- **Privilege escalation**: `doas` everywhere; `sudo` is disabled on every host.
|
||||
|
||||
Reference in New Issue
Block a user