gitea: fix actions visibility

deploy: potentially fix self-deploy issue?
2026-04-22 23:02:53 -04:00 · 2026-04-22 23:02:38 -04:00
8 changed files with 491 additions and 25 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -191,6 +191,21 @@ lib.mkIf config.services.<service>.enable {

 Existing registrations live in `services/jellyfin/jellyfin-deploy-guard.nix` (REST `/Sessions` via curl+jq) and `services/minecraft-deploy-guard.nix` (Server List Ping via `mcstatus`). Prefer soft-fail on unreachable — a service that's already down has no users to disrupt.

+## Deploy finalize (muffin)
+
+`modules/server-deploy-finalize.nix` solves the self-deploy problem: the gitea-actions runner driving CI deploys lives on muffin itself, so a direct `switch-to-configuration switch` restarts the runner mid-activation, killing the SSH session, the CI job, and deploy-rs's magic-rollback handshake. The failure mode is visible as "deploy appears to fail even though the new config landed" (or worse, a rollback storm).
+
+The fix is a two-phase activation wired into `deploy.nodes.muffin.profiles.system.path` in `flake.nix`:
+
+1. `switch-to-configuration boot` — bootloader-only, no service restarts. The runner, SSH session, and magic-rollback survive.
+2. `deploy-finalize` — schedules a detached `systemd-run --on-active=N` transient unit (default 60s). The unit is owned by pid1, so it survives the eventual runner restart. If `/run/booted-system/{kernel,initrd,kernel-modules}` differs from the new profile's, the unit runs `systemctl reboot`; otherwise it runs `switch-to-configuration switch`.
+
+That is, reboot is dynamically gated on kernel/initrd/kernel-modules change. The 60s delay is tuned so the CI job (or manual `./deploy.sh muffin`) has time to emit status/notification steps before the runner is recycled.
+
+Back-to-back deploys supersede each other: each invocation cancels any still-pending `deploy-finalize-*.timer` before scheduling its own. `deploy-finalize --dry-run` prints the decision without scheduling anything — useful when debugging.
+
+Prior art: the 3-path `{kernel,initrd,kernel-modules}` diff is lifted from nixpkgs's `system.autoUpgrade` module (the `allowReboot = true` branch) and was packaged the same way in [obsidiansystems/obelisk#957](https://github.com/obsidiansystems/obelisk/pull/957). nixpkgs#185030 tracks lifting it into `switch-to-configuration` proper but has been stale since 2025-07. The self-deploy `systemd-run` detachment is the proposed fix from [deploy-rs#153](https://github.com/serokell/deploy-rs/issues/153), also unmerged upstream.
+
 ## Technical details

 - **Privilege escalation**: `doas` everywhere; `sudo` is disabled on every host.
--- a/flake.nix
+++ b/flake.nix
@@ -415,7 +415,27 @@
          # want to avoid when the deploy is supposed to be a no-op blocked by
          # the guard. Blocking before the deploy-rs invocation is the only
          # clean way to leave the running system untouched.
-          path = deploy-rs.lib.${system}.activate.nixos self.nixosConfigurations.muffin;
+          #
+          # Activation uses `switch-to-configuration boot` + a detached finalize
+          # (modules/server-deploy-finalize.nix) rather than the default
+          # `switch`. The gitea-actions runner driving CI deploys lives on
+          # muffin itself; a direct `switch` restarts gitea-runner-muffin mid-
+          # activation, killing the SSH session, the CI job, and deploy-rs's
+          # magic-rollback handshake. `boot` only touches the bootloader — no
+          # service restarts — and deploy-finalize schedules a pid1-owned
+          # transient unit that runs the real `switch` (or `systemctl reboot`
+          # when kernel/initrd/kernel-modules changed) ~60s later, surviving
+          # runner restart because it's decoupled from the SSH session.
+          path =
+            deploy-rs.lib.${system}.activate.custom self.nixosConfigurations.muffin.config.system.build.toplevel
+              ''
+                # matches activate.nixos's workaround for NixOS/nixpkgs#73404
+                cd /tmp
+
+                $PROFILE/bin/switch-to-configuration boot
+
+                ${nixpkgs-stable.lib.getExe self.nixosConfigurations.muffin.config.system.build.deployFinalize}
+              '';
        };
      };

--- a/hosts/muffin/default.nix
+++ b/hosts/muffin/default.nix
@@ -26,6 +26,7 @@
    ../../modules/ntfy-alerts.nix
    ../../modules/server-power.nix
    ../../modules/server-deploy-guard.nix
+    ../../modules/server-deploy-finalize.nix

    ../../services/postgresql.nix
    ../../services/jellyfin
@@ -99,6 +100,13 @@

  services.deployGuard.enable = true;

+  # Detached deploy finalize: see modules/server-deploy-finalize.nix. deploy-rs
+  # activates in `boot` mode and invokes deploy-finalize to schedule the real
+  # `switch` (or reboot, when kernel/initrd/kernel-modules changed) 60s later
+  # as a pid1-owned transient unit. Prevents the self-hosted gitea runner from
+  # being restarted mid-CI-deploy.
+  services.deployFinalize.enable = true;
+
  # Disable serial getty on ttyS0 to prevent dmesg warnings
  systemd.services."serial-getty@ttyS0".enable = false;

--- a/modules/server-deploy-finalize.nix
+++ b/modules/server-deploy-finalize.nix
@@ -0,0 +1,187 @@
+# Deferred deploy finalize for deploy-rs-driven hosts.
+#
+# When deploy-rs activates via `switch-to-configuration switch` and the gitea-
+# actions runner driving the deploy lives on the same host, the runner unit
+# gets restarted mid-activation — its definition changes between builds. That
+# restart kills the SSH session, the CI job, and deploy-rs's magic-rollback
+# handshake, so CI reports failure even when the deploy itself completed.
+# This is deploy-rs#153, open since 2022.
+#
+# This module breaks the dependency: activation does `switch-to-configuration
+# boot` (bootloader only, no service restarts), then invokes deploy-finalize
+# which schedules a detached systemd transient unit that fires `delay` seconds
+# later with the real `switch` (or `systemctl reboot` when the kernel, initrd,
+# or kernel-modules changed since boot). The transient unit is owned by pid1,
+# so it survives the runner's eventual restart — by which time the CI job has
+# finished reporting.
+#
+# Prior art (reboot-or-switch logic, not the self-deploy detachment):
+# - nixpkgs `system.autoUpgrade` (allowReboot = true branch) is the canonical
+#   source of the 3-path {initrd,kernel,kernel-modules} comparison.
+# - obsidiansystems/obelisk#957 merged the same snippet into `ob deploy` for
+#   push-based remote deploys — but doesn't need detachment since its deployer
+#   lives on a different machine from the target.
+# - nixpkgs#185030 tracks lifting this into switch-to-configuration proper.
+#   Stale since 2025-07; until it lands, every downstream reimplements it.
+#
+# Bootstrap note: the activation snippet resolves deploy-finalize via
+# lib.getExe (store path), not via `/run/current-system/sw/bin` — `boot` mode
+# does not update `/run/current-system`, so the old binary would be resolved.
+{
+  config,
+  lib,
+  pkgs,
+  ...
+}:
+let
+  cfg = config.services.deployFinalize;
+
+  finalize = pkgs.writeShellApplication {
+    name = "deploy-finalize";
+    runtimeInputs = [
+      pkgs.coreutils
+      pkgs.systemd
+    ];
+    text = ''
+      delay=${toString cfg.delay}
+      profile=/nix/var/nix/profiles/system
+      dry_run=0
+
+      usage() {
+        cat <<EOF
+      Usage: deploy-finalize [--dry-run] [--delay N] [--profile PATH]
+
+      Compares /run/booted-system against PATH (default /nix/var/nix/profiles/system)
+      and schedules either \`systemctl reboot\` (kernel or initrd changed) or
+      \`switch-to-configuration switch\` (services only) via a detached systemd-run
+      timer firing N seconds later.
+
+      Options:
+        --dry-run       Print the decision and would-be command without scheduling.
+        --delay N       Override the delay in seconds. Default: ${toString cfg.delay}.
+        --profile PATH  Override the profile path used for comparison.
+      EOF
+      }
+
+      while [[ $# -gt 0 ]]; do
+        case "$1" in
+          --dry-run) dry_run=1; shift ;;
+          --delay) delay="$2"; shift 2 ;;
+          --profile) profile="$2"; shift 2 ;;
+          -h|--help) usage; exit 0 ;;
+          *)
+            echo "deploy-finalize: unknown option $1" >&2
+            usage >&2
+            exit 2
+            ;;
+        esac
+      done
+
+      # Comparing {kernel,initrd,kernel-modules} matches nixpkgs's canonical
+      # `system.autoUpgrade` allowReboot logic. -e (not -f) so a dangling
+      # symlink counts as missing: on a real NixOS profile all three exist,
+      # but defensive: if a profile has bad symlinks we refuse to schedule
+      # rather than scheduling against ghost paths.
+      booted_kernel="$(readlink -e /run/booted-system/kernel 2>/dev/null || true)"
+      booted_initrd="$(readlink -e /run/booted-system/initrd 2>/dev/null || true)"
+      booted_modules="$(readlink -e /run/booted-system/kernel-modules 2>/dev/null || true)"
+      new_kernel="$(readlink -e "$profile/kernel" 2>/dev/null || true)"
+      new_initrd="$(readlink -e "$profile/initrd" 2>/dev/null || true)"
+      new_modules="$(readlink -e "$profile/kernel-modules" 2>/dev/null || true)"
+
+      if [[ -z "$new_kernel" || -z "$new_initrd" || -z "$new_modules" ]]; then
+        echo "deploy-finalize: refusing to schedule — $profile is missing kernel, initrd, or kernel-modules" >&2
+        exit 1
+      fi
+
+      changed=()
+      if [[ -z "$booted_kernel" || -z "$booted_initrd" || -z "$booted_modules" ]]; then
+        # Unreachable on a booted NixOS, but fail closed on reboot.
+        changed+=("/run/booted-system incomplete")
+      fi
+      [[ "$booted_kernel" != "$new_kernel" ]] && changed+=("kernel")
+      [[ "$booted_initrd" != "$new_initrd" ]] && changed+=("initrd")
+      [[ "$booted_modules" != "$new_modules" ]] && changed+=("kernel-modules")
+
+      reboot_needed=0
+      reason=""
+      if [[ ''${#changed[@]} -gt 0 ]]; then
+        reboot_needed=1
+        # Join with commas so the reason reads as e.g. `kernel,initrd changed`.
+        reason="$(IFS=, ; echo "''${changed[*]}") changed"
+      fi
+
+      if [[ "$reboot_needed" == 1 ]]; then
+        action=reboot
+        cmd="systemctl reboot"
+      else
+        action=switch
+        reason="services only"
+        cmd="$profile/bin/switch-to-configuration switch"
+      fi
+
+      # Nanosecond suffix so back-to-back deploys don't collide on unit names.
+      unit="deploy-finalize-$(date +%s%N)"
+
+      printf 'deploy-finalize: booted_kernel=%s\n'         "$booted_kernel"
+      printf 'deploy-finalize: new_kernel=%s\n'            "$new_kernel"
+      printf 'deploy-finalize: booted_initrd=%s\n'         "$booted_initrd"
+      printf 'deploy-finalize: new_initrd=%s\n'            "$new_initrd"
+      printf 'deploy-finalize: booted_kernel-modules=%s\n' "$booted_modules"
+      printf 'deploy-finalize: new_kernel-modules=%s\n'    "$new_modules"
+      printf 'deploy-finalize: action=%s reason=%s delay=%ss unit=%s\n' \
+        "$action" "$reason" "$delay" "$unit"
+
+      if [[ "$dry_run" == 1 ]]; then
+        printf 'deploy-finalize: dry-run — not scheduling\n'
+        printf 'deploy-finalize: would run: %s\n' "$cmd"
+        printf 'deploy-finalize: would schedule: systemd-run --collect --unit=%s --on-active=%s\n' \
+          "$unit" "$delay"
+        exit 0
+      fi
+
+      # Cancel any still-pending finalize timers from an earlier deploy so this
+      # invocation is authoritative. Without this a stale timer could fire with
+      # the old profile's action (reboot/switch) against the new profile and
+      # briefly run new userspace under the old kernel.
+      systemctl stop 'deploy-finalize-*.timer' 2>/dev/null || true
+
+      # --on-active arms a transient timer owned by pid1. systemd-run returns
+      # once the timer is armed; the SSH session that called us can exit and
+      # the gitea-runner can be restarted (by the switch the timer fires)
+      # without affecting whether the finalize runs.
+      systemd-run \
+        --collect \
+        --unit="$unit" \
+        --description="Finalize NixOS deploy ($action after boot-mode activation)" \
+        --on-active="$delay" \
+        /bin/sh -c "$cmd"
+    '';
+  };
+in
+{
+  options.services.deployFinalize = {
+    enable = lib.mkEnableOption "deferred deploy finalize (switch or reboot) after boot-mode activation";
+
+    delay = lib.mkOption {
+      type = lib.types.ints.positive;
+      default = 60;
+      description = ''
+        Seconds between the deploy-rs activation completing and the scheduled
+        finalize firing. Tuned so the CI job (or manual SSH session) has time
+        to complete status reporting before the runner is restarted by the
+        eventual switch-to-configuration.
+      '';
+    };
+  };
+
+  config = lib.mkIf cfg.enable {
+    environment.systemPackages = [ finalize ];
+
+    # Exposed for the deploy-rs activation snippet to reference by /nix/store
+    # path via lib.getExe — `boot` mode does not update /run/current-system,
+    # so reading through /run/current-system/sw/bin would resolve to the OLD
+    # binary on a new-feature rollout or immediately after a rollback.
+    system.build.deployFinalize = finalize;
+  };
+}
--- a/services/gitea/gitea.nix
+++ b/services/gitea/gitea.nix
@@ -50,14 +50,14 @@
  };

  # Hide repo Actions/workflow details from anonymous visitors. Gitea's own
-  # REQUIRE_SIGNIN_VIEW=expensive mode does not cover /{user}/{repo}/actions,
-  # so we gate the path at Caddy: forward_auth probes Gitea's /api/v1/user
-  # with the incoming request's Cookie/Authorization headers. A logged-in
-  # session answers 200 and the original request falls through to the
-  # reverse_proxy from mkCaddyReverseProxy; a 401 is turned into a redirect
-  # to the login page so the browser shows the login form instead of the
-  # workflow list. Workflow status badges stay public so README links keep
-  # rendering.
+  # REQUIRE_SIGNIN_VIEW=expensive does not cover /{user}/{repo}/actions, and
+  # the API auth chain (routers/api/v1/api.go buildAuthGroup) deliberately
+  # omits `auth_service.Session`, so an /api/v1/user probe would 401 even
+  # for logged-in browser sessions. We gate at Caddy instead: forward_auth
+  # probes a lightweight *web-UI* endpoint that does accept session cookies,
+  # and Gitea's own reqSignIn middleware answers 303 to /user/login for
+  # anonymous callers which we rewrite to preserve the original URL.
+  # Workflow status badges stay public so README links keep rendering.
  services.caddy.virtualHosts.${service_configs.gitea.domain}.extraConfig = ''
    @repoActionsNotBadge {
      path_regexp ^/[^/]+/[^/]+/actions(/.*)?$
@@ -65,9 +65,9 @@
    }
    handle @repoActionsNotBadge {
      forward_auth :${toString service_configs.ports.private.gitea.port} {
-        uri /api/v1/user
+        uri /user/stopwatches

-        @unauthorized status 401
+        @unauthorized status 302 303
        handle_response @unauthorized {
          redir * /user/login?redirect_to={uri} 302
        }
--- a/tests/deploy-finalize.nix
+++ b/tests/deploy-finalize.nix
@@ -0,0 +1,196 @@
+# Test for modules/server-deploy-finalize.nix.
+#
+# Covers the decision and scheduling logic with fabricated profile directories,
+# since spawning a second booted NixOS toplevel to diff kernels is too heavy for
+# a runNixOSTest. We rely on the shellcheck pass baked into writeShellApplication
+# to catch syntax regressions in the script itself.
+{
+  lib,
+  pkgs,
+  inputs,
+  ...
+}:
+pkgs.testers.runNixOSTest {
+  name = "deploy-finalize";
+
+  node.specialArgs = {
+    inherit inputs lib;
+    username = "testuser";
+  };
+
+  nodes.machine =
+    { ... }:
+    {
+      imports = [
+        ../modules/server-deploy-finalize.nix
+      ];
+
+      services.deployFinalize = {
+        enable = true;
+        # Shorter default in the test to make expected-substring assertions
+        # stable and reinforce that the option is wired through.
+        delay = 15;
+      };
+    };
+
+  testScript = ''
+    start_all()
+    machine.wait_for_unit("multi-user.target")
+
+    # Test fixtures: fabricated profile trees whose kernel/initrd/kernel-modules
+    # symlinks are under test control. `readlink -e` requires the targets to
+    # exist, so we point at real files in /tmp rather than non-existent paths.
+    machine.succeed(
+        "mkdir -p /tmp/profile-same /tmp/profile-changed-kernel "
+        "/tmp/profile-changed-initrd /tmp/profile-changed-modules "
+        "/tmp/profile-missing /tmp/fake-targets"
+    )
+    machine.succeed(
+        "touch /tmp/fake-targets/alt-kernel /tmp/fake-targets/alt-initrd "
+        "/tmp/fake-targets/alt-modules"
+    )
+
+    booted_kernel  = machine.succeed("readlink -e /run/booted-system/kernel").strip()
+    booted_initrd  = machine.succeed("readlink -e /run/booted-system/initrd").strip()
+    booted_modules = machine.succeed("readlink -e /run/booted-system/kernel-modules").strip()
+
+    def link_profile(path, kernel, initrd, modules):
+        machine.succeed(f"ln -sf {kernel} {path}/kernel")
+        machine.succeed(f"ln -sf {initrd} {path}/initrd")
+        machine.succeed(f"ln -sf {modules} {path}/kernel-modules")
+
+    # profile-same: matches booted exactly → should choose `switch`.
+    link_profile("/tmp/profile-same", booted_kernel, booted_initrd, booted_modules)
+    machine.succeed("mkdir -p /tmp/profile-same/bin")
+    machine.succeed(
+        "ln -sf /run/current-system/bin/switch-to-configuration "
+        "/tmp/profile-same/bin/switch-to-configuration"
+    )
+
+    # profile-changed-kernel: kernel differs only → should choose `reboot`.
+    link_profile(
+        "/tmp/profile-changed-kernel",
+        "/tmp/fake-targets/alt-kernel",
+        booted_initrd,
+        booted_modules,
+    )
+
+    # profile-changed-initrd: initrd differs only → should choose `reboot`.
+    link_profile(
+        "/tmp/profile-changed-initrd",
+        booted_kernel,
+        "/tmp/fake-targets/alt-initrd",
+        booted_modules,
+    )
+
+    # profile-changed-modules: kernel-modules differs only → should choose `reboot`.
+    # Catches the obelisk PR / nixpkgs auto-upgrade case where modules rebuild
+    # against the same kernel but ABI-incompatible.
+    link_profile(
+        "/tmp/profile-changed-modules",
+        booted_kernel,
+        booted_initrd,
+        "/tmp/fake-targets/alt-modules",
+    )
+
+    # profile-missing: no kernel/initrd/kernel-modules → should fail closed.
+
+    with subtest("dry-run against identical profile selects switch"):
+        rc, out = machine.execute(
+            "deploy-finalize --dry-run --profile /tmp/profile-same 2>&1"
+        )
+        assert rc == 0, f"rc={rc}\n{out}"
+        assert "action=switch" in out, out
+        assert "services only" in out, out
+        assert "dry-run — not scheduling" in out, out
+        assert "would run: /tmp/profile-same/bin/switch-to-configuration switch" in out, out
+        assert "would schedule: systemd-run" in out, out
+
+    with subtest("dry-run against changed-kernel profile selects reboot"):
+        rc, out = machine.execute(
+            "deploy-finalize --dry-run --profile /tmp/profile-changed-kernel 2>&1"
+        )
+        assert rc == 0, f"rc={rc}\n{out}"
+        assert "action=reboot" in out, out
+        assert "reason=kernel changed" in out, out
+        assert "systemctl reboot" in out, out
+
+    with subtest("dry-run against changed-initrd profile selects reboot"):
+        rc, out = machine.execute(
+            "deploy-finalize --dry-run --profile /tmp/profile-changed-initrd 2>&1"
+        )
+        assert rc == 0, f"rc={rc}\n{out}"
+        assert "action=reboot" in out, out
+        assert "reason=initrd changed" in out, out
+
+    with subtest("dry-run against changed-modules profile selects reboot"):
+        rc, out = machine.execute(
+            "deploy-finalize --dry-run --profile /tmp/profile-changed-modules 2>&1"
+        )
+        assert rc == 0, f"rc={rc}\n{out}"
+        assert "action=reboot" in out, out
+        assert "reason=kernel-modules changed" in out, out
+
+    with subtest("dry-run against empty profile fails closed with rc=1"):
+        rc, out = machine.execute(
+            "deploy-finalize --dry-run --profile /tmp/profile-missing 2>&1"
+        )
+        assert rc == 1, f"rc={rc}\n{out}"
+        assert "missing kernel, initrd, or kernel-modules" in out, out
+
+    with subtest("--delay override is reflected in output"):
+        rc, out = machine.execute(
+            "deploy-finalize --dry-run --delay 7 --profile /tmp/profile-same 2>&1"
+        )
+        assert rc == 0, f"rc={rc}\n{out}"
+        assert "delay=7s" in out, out
+
+    with subtest("configured default delay from module option is used"):
+        rc, out = machine.execute(
+            "deploy-finalize --dry-run --profile /tmp/profile-same 2>&1"
+        )
+        assert rc == 0, f"rc={rc}\n{out}"
+        # module option delay=15 in nodes.machine above.
+        assert "delay=15s" in out, out
+
+    with subtest("unknown option rejected with rc=2"):
+        rc, out = machine.execute("deploy-finalize --bogus 2>&1")
+        assert rc == 2, f"rc={rc}\n{out}"
+        assert "unknown option --bogus" in out, out
+
+    with subtest("non-dry run arms a transient systemd timer"):
+        # Long delay so the timer doesn't fire during the test. We stop it
+        # explicitly afterwards.
+        rc, out = machine.execute(
+            "deploy-finalize --delay 3600 --profile /tmp/profile-same 2>&1"
+        )
+        assert rc == 0, f"scheduling rc={rc}\n{out}"
+        # Confirm exactly one transient timer is active.
+        timers = machine.succeed(
+            "systemctl list-units --type=timer --no-legend 'deploy-finalize-*.timer' "
+            "--state=waiting | awk 'NF{print $1}'"
+        ).strip().splitlines()
+        assert len(timers) == 1, f"expected exactly one pending timer, got {timers}"
+        assert timers[0].startswith("deploy-finalize-"), timers
+
+    with subtest("back-to-back scheduling cancels the previous timer"):
+        # The previous subtest left one timer armed. Schedule again; the old
+        # one should be stopped before the new unit name is created.
+        machine.succeed("sleep 1")  # ensure a distinct unit-name timestamp
+        rc, out = machine.execute(
+            "deploy-finalize --delay 3600 --profile /tmp/profile-same 2>&1"
+        )
+        assert rc == 0, f"second-schedule rc={rc}\n{out}"
+        timers = machine.succeed(
+            "systemctl list-units --type=timer --no-legend 'deploy-finalize-*.timer' "
+            "--state=waiting | awk 'NF{print $1}'"
+        ).strip().splitlines()
+        assert len(timers) == 1, f"expected only the new timer, got {timers}"
+
+        # Clean up so the test's shutdown path is quiet.
+        machine.succeed(
+            "systemctl stop 'deploy-finalize-*.timer' 'deploy-finalize-*.service' "
+            "2>/dev/null || true"
+        )
+  '';
+}
--- a/tests/gitea-hide-actions.nix
+++ b/tests/gitea-hide-actions.nix
@@ -67,9 +67,14 @@ pkgs.testers.runNixOSTest {
        # `service_configs.gitea.domain` string, which here is the full URL
        # `http://server`. Override to valid bare values so Gitea doesn't
        # get a malformed ROOT_URL like `https://http://server`.
-        services.gitea.settings.server = {
-          DOMAIN = lib.mkForce "server";
-          ROOT_URL = lib.mkForce "http://server/";
+        services.gitea.settings = {
+          server = {
+            DOMAIN = lib.mkForce "server";
+            ROOT_URL = lib.mkForce "http://server/";
+          };
+          # Tests talk HTTP, so drop the Secure flag — otherwise curl's cookie
+          # jar holds the session cookie but never sends it back.
+          session.COOKIE_SECURE = lib.mkForce false;
        };
        services.caddy = {
          enable = true;
@@ -103,6 +108,8 @@ pkgs.testers.runNixOSTest {
  };

  testScript = ''
+    import re
+
    start_all()
    server.wait_for_unit("postgresql.service")
    server.wait_for_unit("gitea.service")
@@ -110,7 +117,6 @@ pkgs.testers.runNixOSTest {
    server.wait_for_open_port(3000)
    server.wait_for_open_port(80)

-    # Admin user — used to exercise the authenticated path via Basic Auth.
    server.succeed(
        "su -l gitea -s /bin/sh -c '${pkgs.gitea}/bin/gitea admin user create "
        "--username testuser --password testpassword "
@@ -118,13 +124,44 @@ pkgs.testers.runNixOSTest {
        "--work-path /var/lib/gitea'"
    )

-    def curl(args):
+    def curl(args, cookies=None):
+        cookie_args = f"-b {cookies} " if cookies else ""
        cmd = (
            "curl -4 -s -o /dev/null "
-            "-w '%{http_code}|%{redirect_url}' " + args
+            f"-w '%{{http_code}}|%{{redirect_url}}' {cookie_args}{args}"
        )
        return client.succeed(cmd).strip()

+    def login():
+        # Gitea's POST /user/login requires a _csrf token and expects the
+        # matching session cookie already set. Fetch the login form first
+        # to harvest both, then submit credentials with the same cookie jar.
+        client.succeed("rm -f /tmp/cookies.txt")
+        html = client.succeed(
+            "curl -4 -s -c /tmp/cookies.txt http://server/user/login"
+        )
+        match = re.search(r'name="_csrf"\s+value="([^"]+)"', html)
+        assert match, f"CSRF token not found in login form: {html[:500]!r}"
+        csrf = match.group(1)
+        # -L so we follow the post-login redirect; the session cookie is
+        # rewritten by Gitea on successful login to carry uid.
+        client.succeed(
+            "curl -4 -s -L -o /dev/null "
+            "-b /tmp/cookies.txt -c /tmp/cookies.txt "
+            f"--data-urlencode '_csrf={csrf}' "
+            "--data-urlencode 'user_name=testuser' "
+            "--data-urlencode 'password=testpassword' "
+            "http://server/user/login"
+        )
+        # Sanity-check the session by hitting the gated probe directly —
+        # the post-login cookie jar MUST drive /user/stopwatches to 200.
+        probe = client.succeed(
+            "curl -4 -s -o /dev/null -w '%{http_code}' "
+            "-b /tmp/cookies.txt http://server/user/stopwatches"
+        ).strip()
+        assert probe == "200", f"session auth probe expected 200, got {probe!r}"
+        return "/tmp/cookies.txt"
+
    with subtest("Anonymous /{user}/{repo}/actions redirects to login"):
        result = curl("http://server/foo/bar/actions")
        code, _, redir = result.partition("|")
@@ -132,6 +169,9 @@ pkgs.testers.runNixOSTest {
        assert code == "302", f"expected 302, got {code!r} (full: {result!r})"
        assert "/user/login" in redir, f"expected login redirect, got {redir!r}"
        assert "redirect_to=" in redir, f"expected redirect_to param, got {redir!r}"
+        assert "/foo/bar/actions" in redir, (
+            f"expected original URL preserved in redirect_to, got {redir!r}"
+        )

    with subtest("Anonymous deep /actions paths also redirect"):
        for path in ["/foo/bar/actions/", "/foo/bar/actions/runs/1", "/foo/bar/actions/workflows/build.yaml"]:
@@ -142,8 +182,6 @@ pkgs.testers.runNixOSTest {
            assert "/user/login" in redir, f"{path}: expected login redirect, got {redir!r}"

    with subtest("Anonymous workflow badge stays public"):
-        # Repo doesn't exist → Gitea answers 404, but Caddy does NOT intercept
-        # with a login redirect so README badges keep rendering.
        result = curl("http://server/foo/bar/actions/workflows/ci.yaml/badge.svg")
        code, _, redir = result.partition("|")
        print(f"anon badge -> {result!r}")
@@ -151,17 +189,18 @@ pkgs.testers.runNixOSTest {
            f"badge path should not redirect to login, got {result!r}"
        )

-    with subtest("Authenticated /{user}/{repo}/actions does not redirect to login"):
+    cookies = login()
+
+    with subtest("Session-authenticated /{user}/{repo}/actions reaches Gitea"):
        result = curl(
-            "-u testuser:testpassword "
-            "http://server/testuser/nonexistent/actions"
+            "http://server/testuser/nonexistent/actions", cookies=cookies
        )
        code, _, redir = result.partition("|")
        print(f"auth /testuser/nonexistent/actions -> {result!r}")
-        # Gitea will 404 the missing repo — the key assertion is that the
-        # Caddy gate let the request through instead of redirecting to login.
+        # Gitea returns 404 for the missing repo — the key assertion is that
+        # Caddy's gate forwarded the request instead of redirecting to login.
        assert not (code == "302" and "/user/login" in redir), (
-            f"authenticated actions request was intercepted by login gate: {result!r}"
+            f"session-authed actions request was intercepted by login gate: {result!r}"
        )

    with subtest("Anonymous /explore/repos is served without gating"):
--- a/tests/tests.nix
+++ b/tests/tests.nix
@@ -13,6 +13,7 @@ in
  minecraftTest = handleTest ./minecraft.nix;
  jellyfinQbittorrentMonitorTest = handleTest ./jellyfin-qbittorrent-monitor.nix;
  deployGuardTest = handleTest ./deploy-guard.nix;
+  deployFinalizeTest = handleTest ./deploy-finalize.nix;
  filePermsTest = handleTest ./file-perms.nix;

  # fail2ban tests
Author	SHA1	Message	Date
Simon Gardling	0a8b863e4b	gitea: fix actions visibility All checks were successful Build and Deploy / mreow (push) Successful in 2m39s Details Build and Deploy / yarn (push) Successful in 1m48s Details Build and Deploy / muffin (push) Successful in 1m14s Details	2026-04-22 23:02:53 -04:00
Simon Gardling	0901f5edf0	deploy: potentially fix self-deploy issue?	2026-04-22 23:02:38 -04:00