Compare commits

...

129 Commits

Author SHA1 Message Date
4bc5d57fa6 jellyfin: restartTriggers on webhook plugin so install runs at activation
The jellyfin-webhook-install oneshot has 'wantedBy = jellyfin.service',
which only runs it when jellyfin (re)starts. On first rollout to a host
where jellyfin is already running, the unit gets added but never fires,
leaving the Webhook plugin files absent -- jellyfin-webhook-configure
then gets 404 from /Plugins/$GUID/Configuration and deploy-rs rolls back.

Pinning jellyfin.restartTriggers to the plugin package + install script
forces a restart whenever either derivation changes, which pulls install
in via the existing before/wantedBy chain.
2026-04-17 22:08:29 -04:00
1403c9d3bc jellyfin-qbittorrent-monitor: add webhook receiver for instant throttling
Some checks failed
Build and Deploy / deploy (push) Failing after 2m9s
2026-04-17 19:47:29 -04:00
48ac68c297 jellyfin: add webhook plugin helper 2026-04-17 19:47:26 -04:00
fc548a137f patches/nixpkgs: add jellyfin declarative network.xml options 2026-04-17 19:47:23 -04:00
9ea45d4558 hardware: tighten mq-deadline read_expire for jellyfin coexistence 2026-04-17 19:47:20 -04:00
cebdd3ea96 arr: fix prowlarrUrl for cross-netns reachability
All checks were successful
Build and Deploy / deploy (push) Successful in 1m47s
Prowlarr runs in the wg VPN namespace; Sonarr/Radarr run in the host
namespace. Configuring the Prowlarr sync with prowlarrUrl=localhost:9696
made Sonarr/Radarr try to connect to their own localhost, where
Prowlarr does not exist — the host netns. Every indexer sync emitted
'Prowlarr URL is invalid' with Connection refused (localhost:9696).

Use vpnNamespaces.wg.namespaceAddress (192.168.15.1) so host-netns
clients hit the wg-side veth where Prowlarr is listening.

Also re-enables healthChecks on prowlarr-init: the /applications/testall
endpoint now validates clean (manually verified via API).
2026-04-17 00:53:24 -04:00
df57d636f5 arr: declare critical config.xml elements via configXml
All checks were successful
Build and Deploy / deploy (push) Successful in 2m43s
Pin <Port>, <BindAddress>, and <EnableSsl> in each arr service's
config.xml through arr-init's new configXml option. A preStart hook
ensures these elements exist before the service reads its config,
fixing the recurring Prowlarr bug where <Port> was absent from
config.xml and the service would run without binding any socket.

Updates arr-init lock to 6dde2a3.
2026-04-17 00:47:08 -04:00
2f09c800e0 update arr-init
All checks were successful
Build and Deploy / deploy (push) Successful in 3m43s
2026-04-17 00:38:44 -04:00
2c67b9729b arr-init: fix prowlarr health check failure
All checks were successful
Build and Deploy / deploy (push) Successful in 2m59s
Disable health checks on Prowlarr -- the synced-app testall endpoint
requires Sonarr/Radarr to reverse-connect to prowlarrUrl, which is
unreachable across the wg namespace boundary.

Also add networkNamespaceService = "wg" for the new configurable
namespace service dependency (replaces old hardcoded wg.service).
2026-04-16 17:45:19 -04:00
7d77926f8a update arr-init
Some checks failed
Build and Deploy / deploy (push) Failing after 4m43s
2026-04-16 17:33:54 -04:00
2aa401a9ef update
Some checks failed
Build and Deploy / deploy (push) Failing after 3m7s
2026-04-16 16:47:27 -04:00
92f44d6c71 Reapply "minecraft: tweak jvm args"
All checks were successful
Build and Deploy / deploy (push) Successful in 55s
This reverts commit 82a383482e.
2026-04-16 14:35:28 -04:00
daae941d36 minecraft: 1.21.1 -> 26.1.2 2026-04-16 14:35:23 -04:00
5990319445 jellyfin: fix caddy reverse proxy
All checks were successful
Build and Deploy / deploy (push) Successful in 2m46s
2026-04-16 01:30:10 -04:00
55fda4b5ee update (including llamacpp)
All checks were successful
Build and Deploy / deploy (push) Successful in 2m11s
2026-04-15 21:30:06 -04:00
20ca945436 qbt: create timer to flush WAL
All checks were successful
Build and Deploy / deploy (push) Successful in 2m45s
2026-04-15 18:46:26 -04:00
aecd9002b0 zfs tuning 2026-04-15 18:25:56 -04:00
48efd7fcf7 qbittorent: fix (?) perms 2026-04-15 18:25:56 -04:00
0289ce0856 xmrig-auto-pause: tweak resume_threshold 2026-04-15 18:25:56 -04:00
5b98e6197e kernel: rollback to 6.12
Major ZFS issue causing deadlocks on my system:
https://github.com/openzfs/zfs/issues/18426
2026-04-15 18:25:55 -04:00
a0085187a9 fix systemd-tmpfiles
All checks were successful
Build and Deploy / deploy (push) Successful in 3m14s
2026-04-14 21:59:08 -04:00
0c70c2b2b4 add infra for providing updates to yarn 2026-04-14 20:55:39 -04:00
f28dd190bf move off of hardened kernel to latest LTS 2026-04-14 20:04:26 -04:00
a01452bd59 gitea-actions-runner: increase timeout to 6h
Some checks failed
Build and Deploy / deploy (push) Has been cancelled
2026-04-14 18:09:57 -04:00
140330e98d update
All checks were successful
Build and Deploy / deploy (push) Successful in 8m8s
2026-04-13 20:01:36 -04:00
28df0a7f06 jellyseerr: declarative quality profile defaults via arr-init 2026-04-13 19:59:47 -04:00
4aa7c2a44b recyclarr: enforce as sole authority over custom formats 2026-04-13 03:17:03 -04:00
e0c86a956e llama.cpp: disable
All checks were successful
Build and Deploy / deploy (push) Successful in 1m26s
2026-04-12 22:37:05 -04:00
e904e249ed recyclarr: ensure restart on config change
All checks were successful
Build and Deploy / deploy (push) Successful in 1m36s
2026-04-12 22:26:07 -04:00
55001bbe75 recylcarr: hopefully prevent ai upscale torrents
All checks were successful
Build and Deploy / deploy (push) Successful in 1m22s
2026-04-12 22:17:51 -04:00
053160fb36 recyclarr: add upscaled custom format to block fake 2160p
All checks were successful
Build and Deploy / deploy (push) Successful in 1m16s
2026-04-12 21:38:11 -04:00
19ea2dc02b prowlarr: handle bitmagnet restart
All checks were successful
Build and Deploy / deploy (push) Successful in 1m12s
2026-04-12 21:30:08 -04:00
dbf6d2f832 Revert "traccar: init"
Some checks failed
Build and Deploy / deploy (push) Failing after 51s
This reverts commit acfa08fc2e.
2026-04-12 21:04:28 -04:00
acfa08fc2e traccar: init 2026-04-12 21:04:16 -04:00
1f2886d35c AGENTS.md: document postgresql-first policy 2026-04-12 21:04:08 -04:00
674d3cf539 fix tests 2026-04-12 15:36:04 -04:00
bef4ac7ddc update
Some checks failed
Build and Deploy / deploy (push) Failing after 10m32s
2026-04-11 10:28:01 -04:00
12469de580 llama.cpp: things 2026-04-11 10:27:38 -04:00
dad3867144 grafana: fix llama-cpp annotation query format for Grafana 12
All checks were successful
Build and Deploy / deploy (push) Successful in 2m42s
Grafana 12 expects Prometheus annotation queries wrapped in a 'target'
object with datasource, expr, refId, and range fields. The previous
format had expr/step as top-level fields which Grafana silently ignored.
2026-04-09 22:19:21 -04:00
7ee55eca6b typo: systemd.service -> systemd.services
Some checks failed
Build and Deploy / deploy (push) Failing after 15m58s
2026-04-09 20:48:06 -04:00
100999734b ddns-updater: disable DynamicUser to fix secret perms
Some checks failed
Build and Deploy / deploy (push) Failing after 10s
2026-04-09 20:47:04 -04:00
ce1c335230 caddy: wildcard TLS via DNS-01 challenge + ddns-updater for Njalla
Some checks failed
Build and Deploy / deploy (push) Failing after 31m3s
Build Caddy with the caddy-dns/njalla plugin to enable DNS-01 ACME
challenges. This issues a single wildcard certificate for
*.sigkill.computer instead of per-subdomain certificates, reducing
Let's Encrypt API calls and certificate management overhead.

Add ddns-updater service (nixpkgs services.ddns-updater) configured
with Njalla provider to automatically update DNS records when the
server's public IP changes.
2026-04-09 19:54:57 -04:00
e9ce1ce0a2 grafana: replace llama-cpp-annotations daemon with prometheus query 2026-04-09 19:54:57 -04:00
a3a6700106 grafana: replace disk-usage-collector with prometheus-zfs-exporter
The custom disk-usage-collector shell script + minutely timer is replaced
by prometheus-zfs-exporter (pdf/zfs_exporter, packaged in nixpkgs as
services.prometheus.exporters.zfs). The exporter provides pool capacity
metrics (allocated/free/size) natively.

Partition metrics (/boot, /persistent, /nix) now use node_exporter's
built-in filesystem collector (node_filesystem_*_bytes) which already
runs and collects these metrics.

Also fixes a latent race condition in serviceMountWithZpool: the -mounts
service now orders after zfs-mount.service (which runs 'zfs mount -a'),
not just after pool import. Without this, the mount check could run
before datasets are actually mounted.
2026-04-09 19:54:57 -04:00
75319256f3 lib: add mkCaddyReverseProxy, mkFail2banJail, mkGrafanaAnnotationService, extractArrApiKey 2026-04-09 19:54:57 -04:00
c74d356595 xmrig: compile with compiler optimizations
All checks were successful
Build and Deploy / deploy (push) Successful in 2m45s
2026-04-09 16:25:30 -04:00
ae03c2f288 p2pool: don't disable on power loss
p2pool is very light on resources, it's xmrig that should be disabled
2026-04-09 14:44:13 -04:00
0d87f90657 gitea: make gitea-runner wait for gitea.service
Some checks failed
Build and Deploy / deploy (push) Failing after 4m18s
prevents spam on ntfy
2026-04-09 14:16:05 -04:00
d1e9c92423 update
Some checks failed
Build and Deploy / deploy (push) Failing after 4s
2026-04-09 14:03:34 -04:00
4f33b16411 llama.cpp: thing 2026-04-09 14:02:53 -04:00
4f41789995 Reapply "llama-cpp: enable"
All checks were successful
Build and Deploy / deploy (push) Successful in 6m43s
This reverts commit 645a532ed7.
2026-04-07 22:49:53 -04:00
c0390af1a4 llama-cpp: update
All checks were successful
Build and Deploy / deploy (push) Successful in 2m33s
2026-04-07 22:29:02 -04:00
98310f2582 organize patches + add gemma4 patch
All checks were successful
Build and Deploy / deploy (push) Successful in 2m41s
2026-04-07 20:57:54 -04:00
645a532ed7 Revert "llama-cpp: enable"
All checks were successful
Build and Deploy / deploy (push) Successful in 1m52s
This reverts commit fdc1596bce.
2026-04-07 20:23:48 -04:00
2884a39eb1 llama-cpp: patch for vulkan support instead
All checks were successful
Build and Deploy / deploy (push) Successful in 7m23s
2026-04-07 20:07:02 -04:00
fdc1596bce llama-cpp: enable
All checks were successful
Build and Deploy / deploy (push) Successful in 7m16s
2026-04-07 19:15:56 -04:00
778b04a80f Reapply "llama-cpp: maybe use vulkan?"
All checks were successful
Build and Deploy / deploy (push) Successful in 2m17s
This reverts commit 9addb1569a.
2026-04-07 19:12:57 -04:00
88fc219f2d update 2026-04-07 19:11:50 -04:00
a5c7c91e38 Power: disable a bunch of things
All checks were successful
Build and Deploy / deploy (push) Successful in 1m42s
BROKE intel arc A380 completely because it was forced into L1.1/L1.2
pcie substates. Forcewaking the device would fail and it would never come up.

So I will be more conservative on power saving tuning.
2026-04-07 19:08:08 -04:00
628c16fe64 fix git-crypt key for dotfiles workflow
All checks were successful
Build and Deploy / deploy (push) Successful in 2m32s
2026-04-07 13:51:19 -04:00
0df5d98770 grafana: use postgresql
All checks were successful
Build and Deploy / deploy (push) Successful in 2m45s
Doesn't use for data, only annotation and other stuff
2026-04-07 12:44:59 -04:00
2848c7e897 grafana: keep data forever 2026-04-07 12:44:46 -04:00
e57c9cb83b xmrig-auto-pause: raise thresholds for server background load
All checks were successful
Build and Deploy / deploy (push) Successful in 1m59s
2026-04-07 01:09:16 -04:00
d48f27701f xmrig-auto-pause: add hysteresis to prevent stop/start thrashing
xmrig's RandomX pollutes the L3 cache, making other processes appear
~3-8% busier. With a single 5% threshold for both stopping and
resuming, the script oscillates: start xmrig -> cache pressure
inflates CPU -> stop xmrig -> CPU drops -> restart -> repeat.

Split into CPU_STOP_THRESHOLD (15%) and CPU_RESUME_THRESHOLD (5%).
The stop threshold sits above xmrig's indirect pressure, so only
genuine workloads trigger a pause. The resume threshold confirms the
system is truly idle before restarting.
2026-04-07 01:09:06 -04:00
738861fd53 lanzaboote: fix was upstreamed 2026-04-06 19:21:20 -04:00
274ef40ccc lanzaboote: pin to fork with pcrlock reinstall fix
Some checks failed
Build and Deploy / deploy (push) Failing after 3h15m29s
Upstream PR: https://github.com/nix-community/lanzaboote/pull/566
2026-04-06 16:08:57 -04:00
a76a7969d9 nix-cache
Some checks failed
Build and Deploy / deploy (push) Failing after 1h17m39s
2026-04-06 14:21:31 -04:00
4be2eaed35 Reapply "update"
Some checks failed
Build and Deploy / deploy (push) Failing after 10m49s
This reverts commit 655bbda26f.
2026-04-06 13:40:52 -04:00
655bbda26f Revert "update"
All checks were successful
Build and Deploy / deploy (push) Successful in 1m19s
This reverts commit 960259b0d0.
2026-04-06 13:39:32 -04:00
3b8aedd502 fix hardened kernel with nix sandbox 2026-04-06 13:36:38 -04:00
960259b0d0 update
Some checks failed
Build and Deploy / deploy (push) Failing after 2m14s
2026-04-06 13:12:50 -04:00
5fa6f37b28 llama-cpp: disable 2026-04-06 13:12:06 -04:00
7afd1f35d2 xmrig-auto-pause: fix 2026-04-06 13:11:54 -04:00
a12dcb01ec llama-cpp: remove folder 2026-04-06 12:48:28 -04:00
6d47f02a0f llama-cpp: set batch size to 4096
All checks were successful
Build and Deploy / deploy (push) Successful in 1m22s
2026-04-06 02:29:37 -04:00
9addb1569a Revert "llama-cpp: maybe use vulkan?"
This reverts commit 0a927ea893.
2026-04-06 02:28:26 -04:00
df04e36b41 llama-cpp: fix vulkan cache
Some checks failed
Build and Deploy / deploy (push) Failing after 1m23s
2026-04-06 02:23:29 -04:00
0a927ea893 llama-cpp: maybe use vulkan?
All checks were successful
Build and Deploy / deploy (push) Successful in 8m30s
2026-04-06 02:12:46 -04:00
3e46c5bfa5 llama-cpp: use turbo3 for everything
All checks were successful
Build and Deploy / deploy (push) Successful in 1m20s
2026-04-06 01:53:11 -04:00
06aee5af77 llama-cpp: gemma 4 E4B -> gemma 4 E2B
All checks were successful
Build and Deploy / deploy (push) Successful in 2m5s
2026-04-06 01:24:25 -04:00
8fddd3a954 llama-cpp: context: 32768 -> 65536
All checks were successful
Build and Deploy / deploy (push) Successful in 2m58s
2026-04-06 01:04:23 -04:00
0e4f0d3176 llama-cpp: fix model name
All checks were successful
Build and Deploy / deploy (push) Successful in 1m18s
2026-04-06 00:59:20 -04:00
bbcd662c28 xmrig-auto-pause: fix stuck state after external restart, add startup cooldown
All checks were successful
Build and Deploy / deploy (push) Successful in 8m47s
Two bugs found during live verification on the server:

1. Stuck state after external restart: if something else restarted xmrig
   (e.g. deploy-rs activation) while paused_by_us=True, the script never
   detected this and became permanently stuck — unable to stop xmrig on
   future load because it thought xmrig was already stopped.

   Fix: when paused_by_us=True and busy, check if xmrig is actually
   running. If so, reset paused_by_us=False and re-stop it.

2. Flapping on xmrig restart: RandomX dataset init takes ~3.7s of intense
   non-nice CPU, which the script detected as real workload and immediately
   re-stopped xmrig after every restart, creating a start-stop loop.

   Fix: add STARTUP_COOLDOWN (default 10s) — after starting xmrig, skip
   CPU checks until the cooldown expires.

Both bugs were present in production: the script had been stuck since
Apr 3 (2+ days) with xmrig running unmanaged alongside llama-server.
2026-04-05 23:20:47 -04:00
324a9123db better organize related monero and matrix services
All checks were successful
Build and Deploy / deploy (push) Successful in 2m48s
2026-04-04 14:32:26 -04:00
8ea96c8b8e llama-cpp: fix model hash
All checks were successful
Build and Deploy / deploy (push) Successful in 2m36s
2026-04-04 00:28:07 -04:00
3f62b9c88e grafana: replace custom metric collectors with community exporters
Replace three custom Prometheus textfile collector scripts with
dedicated community-maintained exporters:

- jellyfin-collector.nix (25 LoC shell) -> rebelcore/jellyfin_exporter
  Metric: jellyfin_active_streams -> count(jellyfin_now_playing_state)
  Bonus: per-session labels (user, title, device, codec info)

- qbittorrent-collector.nix (40 LoC shell) -> anriha/qbittorrent-metrics-exporter
  Metric: qbittorrent_{download,upload}_bytes_per_second -> qbit_{dl,up}speed
  Bonus: per-torrent metrics with category/tag aggregation

- intel-gpu-collector.nix + .py (130 LoC Python) -> mike1808/igpu-exporter
  Metric: intel_gpu_engine_busy_percent -> igpu_engines_busy_percent
  Bonus: persistent daemon vs oneshot timer, no streaming JSON parser

All three run as persistent daemons scraped by Prometheus, replacing
the textfile-collector pattern of systemd timers writing .prom files.
Dashboard PromQL queries updated to match new metric names.
2026-04-03 15:38:13 -04:00
479ec43b8f llama-cpp: integrate native prometheus /metrics endpoint
llama.cpp server has a built-in /metrics endpoint exposing
prompt_tokens_seconds, predicted_tokens_seconds, tokens_predicted_total,
n_decode_total, and n_busy_slots_per_decode. Enable it with --metrics
and add a Prometheus scrape target, replacing the need for any external
metric collection for LLM inference monitoring.
2026-04-03 15:19:11 -04:00
37ac88fc0f lib: replace deprecated overrideDerivation with overrideAttrs
overrideDerivation has been deprecated since 2019. The new
overrideAttrs properly handles the env attribute set used by
modern derivations to avoid the NIX_CFLAGS_COMPILE overlap
error between env and top-level derivation arguments.
2026-04-03 15:18:22 -04:00
47aeb58f7a llama-cpp: do logging
All checks were successful
Build and Deploy / deploy (push) Successful in 2m27s
2026-04-03 14:39:46 -04:00
daf82c16ba fix xmrig pause 2026-04-03 14:39:20 -04:00
d4d01d63f1 llama-cpp: update + re-enable + gemma 4 E4B
Some checks failed
Build and Deploy / deploy (push) Failing after 20m16s
2026-04-03 14:06:35 -04:00
e765a98487 recyclarr: reset back to default basically
All checks were successful
Build and Deploy / deploy (push) Successful in 2m15s
2026-04-03 13:45:26 -04:00
124d33963e organize
All checks were successful
Build and Deploy / deploy (push) Successful in 2m43s
2026-04-03 00:47:12 -04:00
1451f902ad grafana: re-organize 2026-04-03 00:39:42 -04:00
8e6619097d update
Some checks failed
Build and Deploy / deploy (push) Failing after 4m59s
2026-04-03 00:20:13 -04:00
c2ff07b329 llama-cpp: disable 2026-04-03 00:17:38 -04:00
9e235abf48 monitoring: fix disk-usage-collector timer calendar spec
All checks were successful
Build and Deploy / deploy (push) Successful in 2m14s
2026-04-03 00:17:21 -04:00
096ffeb943 llama-cpp: xmrig + grafana hooks 2026-04-03 00:17:17 -04:00
ab9c12cb97 llama-cpp: general changes 2026-04-03 00:17:14 -04:00
0aeb6c5523 llama-cpp: add API key auth via --api-key-file
Some checks failed
Build and Deploy / deploy (push) Failing after 2m49s
Generate and encrypt a Bearer token for llama-cpp's built-in auth.
Remove caddy_auth from the vhost since basic auth blocks Bearer-only
clients. Internal sidecars (xmrig-pause, annotations) connect
directly to localhost and are unaffected (/slots is public).
2026-04-02 18:02:23 -04:00
bfe7a65db2 monitoring: add zpool and boot partition usage metrics
Add textfile collector for ZFS pool utilization (tank, hdds) and
boot drive partitions (/boot, /persistent, /nix). Runs every 60s.
Add two Grafana dashboard panels: ZFS Pool Utilization and Boot
Drive Partitions as Row 5.
2026-04-02 18:02:23 -04:00
e41f869843 trilium: add self-hosted note-taking service
Add trilium-server on port 8787 behind Caddy reverse proxy at
notes.sigkill.computer. Data stored on ZFS tank pool with
serviceMountWithZpool for mount ordering.
2026-04-02 17:44:04 -04:00
9baeaa5c23 llama-cpp: add grafana annotations for inference requests
Poll /slots endpoint, create annotations when slots start processing,
close with token count when complete. Includes NixOS VM test with
mock llama-cpp and grafana servers. Dashboard annotation entry added.
2026-04-02 17:43:49 -04:00
0235617627 monitoring: fix intel-gpu-collector crash resilience
Wrap entire read_one_sample() in try/except to handle all failures
(missing binary, permission errors, malformed JSON, timeouts).
Write zero-valued metrics on failure instead of exiting non-zero.
Increase timeout from 5s to 8s for slower GPU initialization.
2026-04-02 17:43:13 -04:00
df15be01ea llama-cpp: pause xmrig during active inference requests
Add sidecar service that polls llama-cpp /slots endpoint every 3s.
When any slot is processing, stops xmrig. Restarts xmrig after 10s
grace period when all slots are idle. Handles unreachable llama-cpp
gracefully (leaves xmrig untouched).
2026-04-02 17:43:07 -04:00
50453cf0b5 llama-cpp: adjust args
All checks were successful
Build and Deploy / deploy (push) Successful in 2m32s
2026-04-02 16:09:17 -04:00
bb6ea2f1d5 llama-cpp: cpu only
All checks were successful
Build and Deploy / deploy (push) Successful in 20m0s
2026-04-02 15:32:39 -04:00
f342521d46 llama-cpp: re-add w/ turboquant
All checks were successful
Build and Deploy / deploy (push) Successful in 28m52s
2026-04-02 13:42:39 -04:00
7e779ca0f7 power optimizations 2026-04-02 13:13:38 -04:00
06b2016bd6 recyclarr: things 2026-04-01 20:37:18 -04:00
f9694ae033 qbt: fix categories
All checks were successful
Build and Deploy / deploy (push) Successful in 2m24s
2026-04-01 15:25:40 -04:00
f775f22dbf recylcarr: restart service after config change 2026-04-01 15:25:31 -04:00
1bb0844649 update
Some checks failed
Build and Deploy / deploy (push) Failing after 6m22s
2026-04-01 13:12:14 -04:00
297264a34a tests: extract shared jellyfin test helpers and use real jellyfin in annotations test
Some checks failed
Build and Deploy / deploy (push) Failing after 2m35s
2026-04-01 11:24:44 -04:00
a5206b9ec6 monitoring: add grafana annotations for zfs scrub events 2026-04-01 11:24:43 -04:00
3196b38db7 tests: extract shared mock grafana server from jellyfin test 2026-04-01 11:24:43 -04:00
59d33cea3d grafana: power improvement 2026-04-01 11:24:40 -04:00
fdf57873d7 prowlarr: fix perms
All checks were successful
Build and Deploy / deploy (push) Successful in 2m23s
2026-03-31 23:31:31 -04:00
f1b7679196 grafana: remove unused stuff
All checks were successful
Build and Deploy / deploy (push) Successful in 2m31s
2026-03-31 18:55:07 -04:00
5856d835ba grafana: qbt smoothing 2026-03-31 18:54:57 -04:00
a288e18e6d grafana: qbt stats
All checks were successful
Build and Deploy / deploy (push) Successful in 1m47s
2026-03-31 17:47:53 -04:00
c6b889cea3 grafana: more things
All checks were successful
Build and Deploy / deploy (push) Successful in 2m38s
1. Smoothed out power draw
- UPS only reports on 9 watt intervals, so smoothing it out gives more
relative detail on trends
2. Add jellyfin integration
- Good for seeing correlations between statistics and jellyfin streams
3. intel gpu stats
- Provides info on utilization of the gpu
2026-03-31 17:25:06 -04:00
0027489052 grafana: smooth power draw
All checks were successful
Build and Deploy / deploy (push) Successful in 2m29s
2026-03-31 13:20:07 -04:00
ebc4c66fc3 update
All checks were successful
Build and Deploy / deploy (push) Successful in 8m58s
2026-03-31 12:52:21 -04:00
bc227a89c1 remove old secrets
All checks were successful
Build and Deploy / deploy (push) Successful in 2m12s
2026-03-31 12:47:09 -04:00
e3be112b82 grafana: init
All checks were successful
Build and Deploy / deploy (push) Successful in 2m14s
Shows powerdraw, temps, uptime, and jellyfin streams
2026-03-31 12:38:43 -04:00
5375f8ee34 gitea: add actions runner and CI/CD deploy workflow
This will avoid me having to run "deploy" myself on my laptop.
All I will need to do is push a commit and it will self-deploy.
2026-03-31 12:38:43 -04:00
e4feaa35ad secrets: migrate build-time secrets to agenix runtime
- coturn: switch static-auth-secret to static-auth-secret-file
- matrix: switch registration_token and turn_secret to file-based
- murmur: switch password to environmentFile with agenix
- p2pool: move public wallet address to service-configs.nix
2026-03-31 12:38:43 -04:00
eaeeed7f45 fix mq-deadline for hdds: 3 2026-03-31 12:38:42 -04:00
99 changed files with 4859 additions and 509 deletions

View File

@@ -0,0 +1,60 @@
name: Build and Deploy
on:
push:
branches: [main]
jobs:
deploy:
runs-on: nix
env:
GIT_SSH_COMMAND: "ssh -i /run/agenix/ci-deploy-key -o StrictHostKeyChecking=yes -o UserKnownHostsFile=/etc/ci-known-hosts"
steps:
- uses: https://github.com/actions/checkout@v4
with:
fetch-depth: 0
- name: Unlock git-crypt
run: |
git-crypt unlock /run/agenix/git-crypt-key-server-config
- name: Build NixOS configuration
run: |
nix build .#nixosConfigurations.muffin.config.system.build.toplevel -L
- name: Deploy via deploy-rs
run: |
eval $(ssh-agent -s)
ssh-add /run/agenix/ci-deploy-key
nix run github:serokell/deploy-rs -- .#muffin --skip-checks --ssh-opts="-o StrictHostKeyChecking=yes -o UserKnownHostsFile=/etc/ci-known-hosts"
- name: Health check
run: |
sleep 10
ssh -i /run/agenix/ci-deploy-key -o StrictHostKeyChecking=yes -o UserKnownHostsFile=/etc/ci-known-hosts root@server-public \
"systemctl is-active gitea && systemctl is-active caddy && systemctl is-active continuwuity && systemctl is-active coturn"
- name: Notify success
if: success()
run: |
TOPIC=$(cat /run/agenix/ntfy-alerts-topic | tr -d '[:space:]')
TOKEN=$(cat /run/agenix/ntfy-alerts-token | tr -d '[:space:]')
curl -sf -o /dev/null -X POST \
"https://ntfy.sigkill.computer/$TOPIC" \
-H "Authorization: Bearer $TOKEN" \
-H "Title: [muffin] Deploy succeeded" \
-H "Priority: default" \
-H "Tags: white_check_mark" \
-d "server-config deployed from commit ${GITHUB_SHA::8}"
- name: Notify failure
if: failure()
run: |
TOPIC=$(cat /run/agenix/ntfy-alerts-topic 2>/dev/null | tr -d '[:space:]')
TOKEN=$(cat /run/agenix/ntfy-alerts-token 2>/dev/null | tr -d '[:space:]')
curl -sf -o /dev/null -X POST \
"https://ntfy.sigkill.computer/$TOPIC" \
-H "Authorization: Bearer $TOKEN" \
-H "Title: [muffin] Deploy FAILED" \
-H "Priority: urgent" \
-H "Tags: rotating_light" \
-d "server-config deploy failed at commit ${GITHUB_SHA::8}" || true

View File

@@ -99,7 +99,11 @@ Each service file in `services/` follows this structure:
- **git-crypt**: `secrets/` directory and `usb-secrets/usb-secrets-key*` are encrypted (see `.gitattributes`)
- **agenix**: secrets declared in `modules/age-secrets.nix`, decrypted at runtime to `/run/agenix/`
- **Identity**: USB drive at `/mnt/usb-secrets/usb-secrets-key`
- **Encrypting new secrets**: The agenix encryption key is in `usb-secrets/usb-secrets-key` (SSH private key, git-crypt encrypted). To create a new secret: derive the age public key with `ssh-keygen -y -f usb-secrets/usb-secrets-key | ssh-to-age`, then encrypt with `age -r <public-key> -o secrets/<name>.age`.
- **Encrypting new secrets**: The agenix identity is an SSH private key at `usb-secrets/usb-secrets-key` (git-crypt encrypted). To encrypt a new secret, use the SSH public key directly with `age -R`:
```bash
age -R <(ssh-keygen -y -f usb-secrets/usb-secrets-key) -o secrets/<name>.age /path/to/plaintext
```
- **DO NOT use `ssh-to-age`**. Using `ssh-to-age` to derive a native age public key and then encrypting with `age -r age1...` produces `X25519` recipient stanzas. The SSH private key identity on the server can only decrypt `ssh-ed25519` stanzas. This mismatch causes `age: error: no identity matched any of the recipients` at deploy time. Always use `age -R` with the SSH public key directly.
- Never read or commit plaintext secrets. Never log secret values.
### Important Patterns
@@ -108,6 +112,7 @@ Each service file in `services/` follows this structure:
- **Hugepages**: Services needing large pages declare their budget in `service-configs.nix` under `hugepages_2m.services`. The kernel sysctl is set automatically from the total.
- **Domain**: Primary domain is `sigkill.computer`. Old domain `gardling.com` redirects automatically.
- **Hardened kernel**: Uses `_hardened` kernel. Security-sensitive defaults apply.
- **PostgreSQL as central database**: All services that support PostgreSQL MUST use it instead of embedded databases (H2, SQLite, etc.). Connect via Unix socket with peer auth when possible (JDBC services can use junixsocket). The PostgreSQL instance is declared in `services/postgresql.nix` with ZFS-backed storage. Use `ensureDatabases`/`ensureUsers` to auto-create databases and roles.
### Test Pattern
Tests use `pkgs.testers.runNixOSTest` (NixOS VM tests):

View File

@@ -20,17 +20,18 @@
./modules/no-rgb.nix
./modules/security.nix
./modules/ntfy-alerts.nix
./modules/power.nix
./services/postgresql.nix
./services/jellyfin.nix
./services/caddy.nix
./services/jellyfin
./services/caddy
./services/immich.nix
./services/gitea.nix
./services/gitea-actions-runner.nix
./services/minecraft.nix
./services/wg.nix
./services/qbittorrent.nix
./services/jellyfin-qbittorrent-monitor.nix
./services/bitmagnet.nix
./services/arr/prowlarr.nix
@@ -45,21 +46,19 @@
./services/soulseek.nix
# ./services/llama-cpp.nix
./services/trilium.nix
./services/ups.nix
./services/grafana
./services/bitwarden.nix
./services/firefox-syncserver.nix
./services/matrix.nix
./services/coturn.nix
./services/livekit.nix
./services/matrix
./services/monero.nix
./services/p2pool.nix
./services/xmrig.nix
# KEEP UNTIL 2028
./services/caddy_senior_project.nix
./services/monero
./services/graphing-calculator.nix
@@ -67,20 +66,28 @@
./services/syncthing.nix
./services/ntfy.nix
./services/ntfy-alerts.nix
./services/ntfy
./services/mollysocket.nix
./services/harmonia.nix
./services/ddns-updater.nix
];
services.kmscon.enable = true;
# Hosts entries for CI/CD deploy targets
networking.hosts."192.168.1.50" = [ "server-public" ];
networking.hosts."192.168.1.223" = [ "desktop" ];
systemd.targets = {
sleep.enable = false;
suspend.enable = false;
hibernate.enable = false;
hybrid-sleep.enable = false;
};
# SSH known_hosts for CI runner (pinned host keys)
environment.etc."ci-known-hosts".text = ''
server-public ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFMjgaMnE+zS7tL+m5E7gh9Q9U1zurLdmU0qcmEmaucu
192.168.1.50 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFMjgaMnE+zS7tL+m5E7gh9Q9U1zurLdmU0qcmEmaucu
git.sigkill.computer ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFMjgaMnE+zS7tL+m5E7gh9Q9U1zurLdmU0qcmEmaucu
git.gardling.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFMjgaMnE+zS7tL+m5E7gh9Q9U1zurLdmU0qcmEmaucu
'';
services.kmscon.enable = true;
# Disable serial getty on ttyS0 to prevent dmesg warnings
systemd.services."serial-getty@ttyS0".enable = false;
@@ -93,12 +100,6 @@
enable = false;
};
powerManagement = {
powertop.enable = true;
enable = true;
cpuFreqGovernor = "powersave";
};
# https://github.com/NixOS/nixpkgs/issues/101459#issuecomment-758306434
security.pam.loginLimits = [
{
@@ -121,14 +122,21 @@
};
};
hardware.intelgpu.driver = "xe";
# Intel Arc A380 (DG2, 56a5) uses the i915 driver on kernel 6.12.
# The xe driver's iHD media driver integration has buffer mapping
# failures on this GPU/kernel combination. i915 works correctly for
# VAAPI transcode as long as ASPM deep states are disabled for the
# GPU (see modules/power.nix).
hardware.intelgpu.driver = "i915";
# Per-service 2MB hugepage budget calculated in service-configs.nix.
boot.kernel.sysctl."vm.nr_hugepages" = service_configs.hugepages_2m.total_pages;
boot = {
# 6.12 LTS until 2026
kernelPackages = pkgs.linuxPackages_6_12_hardened;
# 6.12 LTS until 2027-03. Kernel 6.18 causes a reproducible ZFS deadlock
# in dbuf_evict due to page allocator changes (__free_frozen_pages).
# https://github.com/openzfs/zfs/issues/18426
kernelPackages = pkgs.linuxPackages_6_12;
loader = {
# Use the systemd-boot EFI boot loader.
@@ -249,6 +257,14 @@
users.groups.${service_configs.media_group} = { };
users.users.gitea-runner = {
isSystemUser = true;
group = "gitea-runner";
home = "/var/lib/gitea-runner";
description = "Gitea Actions CI runner";
};
users.groups.gitea-runner = { };
users.users.${username} = {
isNormalUser = true;
extraGroups = [
@@ -290,7 +306,8 @@
enable = true;
openFirewall = true;
welcometext = "meow meow meow meow meow :3 xd";
password = builtins.readFile ./secrets/murmur_password;
password = "$MURMURD_PASSWORD";
environmentFile = config.age.secrets.murmur-password-env.path;
port = service_configs.ports.public.murmur.port;
};

267
flake.lock generated
View File

@@ -27,16 +27,17 @@
},
"arr-init": {
"inputs": {
"flake-utils": "flake-utils",
"nixpkgs": [
"nixpkgs"
]
},
"locked": {
"lastModified": 1774681523,
"narHash": "sha256-K49RohIwbgzVeOdStfVDO83qy5K5ZLKWk4EsHJKj/k4=",
"lastModified": 1776401121,
"narHash": "sha256-BELV1YMBuLL0aQNQ3SLvSLq8YN5h2o1jcrwz1+Zt32Q=",
"ref": "refs/heads/main",
"rev": "f8475f6cb4d4d4df99002d07cf9583fb33b87876",
"revCount": 11,
"rev": "6dde2a3e0d087208b8084b61113707c5533c4c2d",
"revCount": 19,
"type": "git",
"url": "ssh://gitea@git.gardling.com/titaniumtown/arr-init"
},
@@ -102,6 +103,29 @@
"type": "github"
}
},
"fenix": {
"inputs": {
"nixpkgs": [
"qbittorrent-metrics-exporter",
"naersk",
"nixpkgs"
],
"rust-analyzer-src": "rust-analyzer-src"
},
"locked": {
"lastModified": 1752475459,
"narHash": "sha256-z6QEu4ZFuHiqdOPbYss4/Q8B0BFhacR8ts6jO/F/aOU=",
"owner": "nix-community",
"repo": "fenix",
"rev": "bf0d6f70f4c9a9cf8845f992105652173f4b617f",
"type": "github"
},
"original": {
"owner": "nix-community",
"repo": "fenix",
"type": "github"
}
},
"flake-compat": {
"flake": false,
"locked": {
@@ -150,9 +174,45 @@
"type": "github"
}
},
"flake-parts": {
"inputs": {
"nixpkgs-lib": "nixpkgs-lib"
},
"locked": {
"lastModified": 1730504689,
"narHash": "sha256-hgmguH29K2fvs9szpq2r3pz2/8cJd2LPS+b4tfNFCwE=",
"owner": "hercules-ci",
"repo": "flake-parts",
"rev": "506278e768c2a08bec68eb62932193e341f55c90",
"type": "github"
},
"original": {
"owner": "hercules-ci",
"repo": "flake-parts",
"type": "github"
}
},
"flake-utils": {
"inputs": {
"systems": "systems_4"
"systems": "systems_2"
},
"locked": {
"lastModified": 1731533236,
"narHash": "sha256-l0KFg5HjrsfsO/JpG+r7fRrqm12kzFHyUHqHCVpMMbI=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "11707dc2f618dd54ca8739b309ec4fc024de578b",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"flake-utils_2": {
"inputs": {
"systems": "systems_6"
},
"locked": {
"lastModified": 1731533236,
@@ -197,11 +257,11 @@
]
},
"locked": {
"lastModified": 1774875830,
"narHash": "sha256-WPYlTmZvVa9dWlAziFkVjBdv1Z6giNIq40O1DxsBmiI=",
"lastModified": 1775425411,
"narHash": "sha256-KY6HsebJHEe5nHOWP7ur09mb0drGxYSzE3rQxy62rJo=",
"owner": "nix-community",
"repo": "home-manager",
"rev": "7afd8cebb99e25a64a745765920e663478eb8830",
"rev": "0d02ec1d0a05f88ef9e74b516842900c41f0f2fe",
"type": "github"
},
"original": {
@@ -263,11 +323,11 @@
"rust-overlay": "rust-overlay"
},
"locked": {
"lastModified": 1774858933,
"narHash": "sha256-rgHUoE4QhOvK3Rcl9cbuIVdjPjFjfhcTm/uPs8Y7+2w=",
"lastModified": 1776248416,
"narHash": "sha256-TC6yzbCAex1pDfqUZv9u8fVm8e17ft5fNrcZ0JRDOIQ=",
"owner": "nix-community",
"repo": "lanzaboote",
"rev": "45338aab3013924c75305f5cb3543b9cda993183",
"rev": "18e9e64bae15b828c092658335599122a6db939b",
"type": "github"
},
"original": {
@@ -276,20 +336,62 @@
"type": "github"
}
},
"llamacpp": {
"inputs": {
"flake-parts": "flake-parts",
"nixpkgs": [
"nixpkgs"
]
},
"locked": {
"lastModified": 1776301820,
"narHash": "sha256-Yr3JRZ05PNmX4sR2Ak7e0jT+oCQgTAAML7FUoyTmitk=",
"owner": "TheTom",
"repo": "llama-cpp-turboquant",
"rev": "1073622985bb68075472474b4b0fdfcdabcfc9d0",
"type": "github"
},
"original": {
"owner": "TheTom",
"ref": "feature/turboquant-kv-cache",
"repo": "llama-cpp-turboquant",
"type": "github"
}
},
"naersk": {
"inputs": {
"fenix": "fenix",
"nixpkgs": "nixpkgs_2"
},
"locked": {
"lastModified": 1763384566,
"narHash": "sha256-r+wgI+WvNaSdxQmqaM58lVNvJYJ16zoq+tKN20cLst4=",
"owner": "nix-community",
"repo": "naersk",
"rev": "d4155d6ebb70fbe2314959842f744aa7cabbbf6a",
"type": "github"
},
"original": {
"owner": "nix-community",
"ref": "master",
"repo": "naersk",
"type": "github"
}
},
"nix-minecraft": {
"inputs": {
"flake-compat": "flake-compat_3",
"nixpkgs": [
"nixpkgs"
],
"systems": "systems_3"
"systems": "systems_4"
},
"locked": {
"lastModified": 1774896109,
"narHash": "sha256-0ue9vbpiLP5ZHZd5e7eQAplM9Rb4tunV+u5xcJDJ+lc=",
"lastModified": 1776310483,
"narHash": "sha256-xMFl+umxGmo5VEgcZcXT5Dk9sXU5WyTRz1Olpywr/60=",
"owner": "Infinidoge",
"repo": "nix-minecraft",
"rev": "cb646a9e33cfa6b7d5facd9b7d4ca738ad1ff953",
"rev": "74abd91054e2655d6c392428a27e5d27edd5e6bf",
"type": "github"
},
"original": {
@@ -300,11 +402,11 @@
},
"nixos-hardware": {
"locked": {
"lastModified": 1774777275,
"narHash": "sha256-qogBiYFq8hZusDPeeKRqzelBAhZvREc7Cl+qlewGUCg=",
"lastModified": 1775490113,
"narHash": "sha256-2ZBhDNZZwYkRmefK5XLOusCJHnoeKkoN95hoSGgMxWM=",
"owner": "NixOS",
"repo": "nixos-hardware",
"rev": "b8f81636927f1af0cca812d22c876bad0a883ccd",
"rev": "c775c2772ba56e906cbeb4e0b2db19079ef11ff7",
"type": "github"
},
"original": {
@@ -316,11 +418,11 @@
},
"nixpkgs": {
"locked": {
"lastModified": 1774388614,
"narHash": "sha256-tFwzTI0DdDzovdE9+Ras6CUss0yn8P9XV4Ja6RjA+nU=",
"lastModified": 1776221942,
"narHash": "sha256-FbQAeVNi7G4v3QCSThrSAAvzQTmrmyDLiHNPvTF2qFM=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "1073dad219cb244572b74da2b20c7fe39cb3fa9e",
"rev": "1766437c5509f444c1b15331e82b8b6a9b967000",
"type": "github"
},
"original": {
@@ -330,6 +432,18 @@
"type": "github"
}
},
"nixpkgs-lib": {
"locked": {
"lastModified": 1730504152,
"narHash": "sha256-lXvH/vOfb4aGYyvFmZK/HlsNsr/0CVWlwYvo2rxJk3s=",
"type": "tarball",
"url": "https://github.com/NixOS/nixpkgs/archive/cc2f28000298e1269cea6612cd06ec9979dd5d7f.tar.gz"
},
"original": {
"type": "tarball",
"url": "https://github.com/NixOS/nixpkgs/archive/cc2f28000298e1269cea6612cd06ec9979dd5d7f.tar.gz"
}
},
"nixpkgs-p2pool-module": {
"flake": false,
"locked": {
@@ -348,6 +462,22 @@
}
},
"nixpkgs_2": {
"locked": {
"lastModified": 1752077645,
"narHash": "sha256-HM791ZQtXV93xtCY+ZxG1REzhQenSQO020cu6rHtAPk=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "be9e214982e20b8310878ac2baa063a961c1bdf6",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixpkgs-unstable",
"repo": "nixpkgs",
"type": "github"
}
},
"nixpkgs_3": {
"locked": {
"lastModified": 1764517877,
"narHash": "sha256-pp3uT4hHijIC8JUK5MEqeAWmParJrgBVzHLNfJDZxg4=",
@@ -386,6 +516,28 @@
"type": "github"
}
},
"qbittorrent-metrics-exporter": {
"inputs": {
"naersk": "naersk",
"nixpkgs": [
"nixpkgs"
],
"systems": "systems_5"
},
"locked": {
"lastModified": 1771989937,
"narHash": "sha256-bPUV4gVvSbF4VMkbLKYrfwVwzTeS+Sr41wucDj1///g=",
"ref": "refs/heads/main",
"rev": "cb94f866b7a2738532b1cae31d0b9f89adecbd54",
"revCount": 112,
"type": "git",
"url": "https://codeberg.org/anriha/qbittorrent-metrics-exporter"
},
"original": {
"type": "git",
"url": "https://codeberg.org/anriha/qbittorrent-metrics-exporter"
}
},
"root": {
"inputs": {
"agenix": "agenix",
@@ -395,10 +547,12 @@
"home-manager": "home-manager",
"impermanence": "impermanence",
"lanzaboote": "lanzaboote",
"llamacpp": "llamacpp",
"nix-minecraft": "nix-minecraft",
"nixos-hardware": "nixos-hardware",
"nixpkgs": "nixpkgs",
"nixpkgs-p2pool-module": "nixpkgs-p2pool-module",
"qbittorrent-metrics-exporter": "qbittorrent-metrics-exporter",
"senior_project-website": "senior_project-website",
"srvos": "srvos",
"trackerlist": "trackerlist",
@@ -407,6 +561,23 @@
"ytbn-graphing-software": "ytbn-graphing-software"
}
},
"rust-analyzer-src": {
"flake": false,
"locked": {
"lastModified": 1752428706,
"narHash": "sha256-EJcdxw3aXfP8Ex1Nm3s0awyH9egQvB2Gu+QEnJn2Sfg=",
"owner": "rust-lang",
"repo": "rust-analyzer",
"rev": "591e3b7624be97e4443ea7b5542c191311aa141d",
"type": "github"
},
"original": {
"owner": "rust-lang",
"ref": "nightly",
"repo": "rust-analyzer",
"type": "github"
}
},
"rust-overlay": {
"inputs": {
"nixpkgs": [
@@ -452,11 +623,11 @@
"senior_project-website": {
"flake": false,
"locked": {
"lastModified": 1771869552,
"narHash": "sha256-veaVrRWCSy7HYAAjUFLw8HASKcj+3f0W+sCwS3QiaM4=",
"lastModified": 1775019649,
"narHash": "sha256-zVQy5ydiWKnIixf79pmd2LJTPkwyiv4V5piKZETDdwI=",
"owner": "Titaniumtown",
"repo": "senior-project-website",
"rev": "28a2b93492dac877dce0b38f078eacf74fce26e7",
"rev": "bfd504c77c90524b167158652e1d87a260680120",
"type": "github"
},
"original": {
@@ -472,11 +643,11 @@
]
},
"locked": {
"lastModified": 1774517972,
"narHash": "sha256-oPIVzGlMmfWuJlRbr87yU3cnV8NxtwTG92GqpQczlkw=",
"lastModified": 1776306894,
"narHash": "sha256-l4N3O1cfXiQCHJGspAkg6WlZyOFBTbLXhi8Anf8jB0g=",
"owner": "nix-community",
"repo": "srvos",
"rev": "0ddba2fbd72bb60f8b35b7de1ad67590f454d402",
"rev": "01d98209264c78cb323b636d7ab3fe8e7a8b60c7",
"type": "github"
},
"original": {
@@ -545,14 +716,44 @@
"type": "github"
}
},
"systems_5": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"systems_6": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"trackerlist": {
"flake": false,
"locked": {
"lastModified": 1774822185,
"narHash": "sha256-vEVjI/PWBfxLd1dzlo1MUSqeH5J33OLU7a5dfuzOKu4=",
"lastModified": 1776290985,
"narHash": "sha256-eNWDOLBA0vk1TiKqse71siIAgLycjvBFDw35eAtnUPs=",
"owner": "ngosang",
"repo": "trackerslist",
"rev": "c77ccbef4be871a37dcbd3465d62bb8b5c7bc025",
"rev": "9bb380b3c2a641a3289f92dedef97016f2e47f36",
"type": "github"
},
"original": {
@@ -563,7 +764,7 @@
},
"utils": {
"inputs": {
"systems": "systems_2"
"systems": "systems_3"
},
"locked": {
"lastModified": 1731533236,
@@ -612,8 +813,8 @@
},
"ytbn-graphing-software": {
"inputs": {
"flake-utils": "flake-utils",
"nixpkgs": "nixpkgs_2",
"flake-utils": "flake-utils_2",
"nixpkgs": "nixpkgs_3",
"rust-overlay": "rust-overlay_2"
},
"locked": {

View File

@@ -28,6 +28,11 @@
inputs.nixpkgs.follows = "nixpkgs";
};
llamacpp = {
url = "github:TheTom/llama-cpp-turboquant/feature/turboquant-kv-cache";
inputs.nixpkgs.follows = "nixpkgs";
};
srvos = {
url = "github:nix-community/srvos";
inputs.nixpkgs.follows = "nixpkgs";
@@ -78,6 +83,11 @@
url = "github:JacoMalan1/nixpkgs/create-p2pool-service";
flake = false;
};
qbittorrent-metrics-exporter = {
url = "git+https://codeberg.org/anriha/qbittorrent-metrics-exporter";
inputs.nixpkgs.follows = "nixpkgs";
};
};
outputs =
@@ -113,7 +123,7 @@
name = "nixpkgs-patched";
src = nixpkgs;
patches = [
./patches/0001-firefox-syncserver-add-postgresql-backend-support.patch
./patches/nixpkgs/0001-firefox-syncserver-add-postgresql-backend-support.patch
];
};

View File

@@ -46,6 +46,22 @@
group = "caddy";
};
# Njalla API token (NJALLA_API_TOKEN=...) for Caddy DNS-01 challenge
njalla-api-token-env = {
file = ../secrets/njalla-api-token-env.age;
mode = "0400";
owner = "caddy";
group = "caddy";
};
# ddns-updater config.json with Njalla provider credentials
ddns-updater-config = {
file = ../secrets/ddns-updater-config.age;
mode = "0400";
owner = "ddns-updater";
group = "ddns-updater";
};
jellyfin-api-key = {
file = ../secrets/jellyfin-api-key.age;
mode = "0400";
@@ -68,19 +84,19 @@
group = "root";
};
# ntfy-alerts secrets
# ntfy-alerts secrets (group-readable for CI runner notifications)
ntfy-alerts-topic = {
file = ../secrets/ntfy-alerts-topic.age;
mode = "0400";
mode = "0440";
owner = "root";
group = "root";
group = "gitea-runner";
};
ntfy-alerts-token = {
file = ../secrets/ntfy-alerts-token.age;
mode = "0400";
mode = "0440";
owner = "root";
group = "root";
group = "gitea-runner";
};
# Firefox Sync server secrets (SYNC_MASTER_SECRET)
@@ -94,5 +110,94 @@
file = ../secrets/mollysocket-env.age;
mode = "0400";
};
# Murmur (Mumble) server password
murmur-password-env = {
file = ../secrets/murmur-password-env.age;
mode = "0400";
owner = "murmur";
group = "murmur";
};
# Coturn static auth secret
coturn-auth-secret = {
file = ../secrets/coturn-auth-secret.age;
mode = "0400";
owner = "turnserver";
group = "turnserver";
};
# Matrix (continuwuity) registration token
matrix-reg-token = {
file = ../secrets/matrix-reg-token.age;
mode = "0400";
owner = "continuwuity";
group = "continuwuity";
};
# Matrix (continuwuity) TURN secret — same secret as coturn-auth-secret,
# decrypted separately so continuwuity can read it with its own ownership
matrix-turn-secret = {
file = ../secrets/coturn-auth-secret.age;
mode = "0400";
owner = "continuwuity";
group = "continuwuity";
};
# CI deploy SSH key
ci-deploy-key = {
file = ../secrets/ci-deploy-key.age;
mode = "0400";
owner = "gitea-runner";
group = "gitea-runner";
};
# Git-crypt symmetric key for dotfiles repo
git-crypt-key-dotfiles = {
file = ../secrets/git-crypt-key-dotfiles.age;
mode = "0400";
owner = "gitea-runner";
group = "gitea-runner";
};
# Git-crypt symmetric key for server-config repo
git-crypt-key-server-config = {
file = ../secrets/git-crypt-key-server-config.age;
mode = "0400";
owner = "gitea-runner";
group = "gitea-runner";
};
# Gitea Actions runner registration token
gitea-runner-token = {
file = ../secrets/gitea-runner-token.age;
mode = "0400";
owner = "gitea-runner";
group = "gitea-runner";
};
# llama-cpp API key for bearer token auth
llama-cpp-api-key = {
file = ../secrets/llama-cpp-api-key.age;
mode = "0400";
owner = "root";
group = "root";
};
# Harmonia binary cache signing key
harmonia-sign-key = {
file = ../secrets/harmonia-sign-key.age;
mode = "0400";
owner = "harmonia";
group = "harmonia";
};
# Caddy basic auth for nix binary cache (separate from main caddy_auth)
nix-cache-auth = {
file = ../secrets/nix-cache-auth.age;
mode = "0400";
owner = "caddy";
group = "caddy";
};
};
}

View File

@@ -5,6 +5,20 @@
service_configs,
...
}:
let
hddTuneIosched = pkgs.writeShellScript "hdd-tune-iosched" ''
# Called by udev with the partition kernel name (e.g. sdb1).
# Derives the parent disk and applies mq-deadline iosched params.
parent=''${1%%[0-9]*}
dev="/sys/block/$parent"
[ -d "$dev/queue/iosched" ] || exit 0
echo 500 > "$dev/queue/iosched/read_expire"
echo 15000 > "$dev/queue/iosched/write_expire"
echo 128 > "$dev/queue/iosched/fifo_batch"
echo 16 > "$dev/queue/iosched/writes_starved"
echo 4096 > "$dev/queue/max_sectors_kb" 2>/dev/null || true
'';
in
{
boot.initrd.availableKernelModules = [
"xhci_pci"
@@ -22,56 +36,27 @@
hardware.cpu.amd.updateMicrocode = true;
hardware.enableRedistributableFirmware = true;
# HDD I/O tuning for torrent seeding workload (high-concurrency random reads).
# HDD I/O tuning for torrent seeding workload (high-concurrency random reads)
# sharing the pool with latency-sensitive sequential reads (Jellyfin playback).
#
# mq-deadline sorts requests into elevator sweeps, reducing seek distance.
# Aggressive deadlines (15s) let the scheduler accumulate more ops before dispatching,
# maximizing coalescence — latency is irrelevant since torrent peers tolerate 30-60s.
# read_expire=500ms keeps reads bounded so a Jellyfin segment can't queue for
# seconds behind a torrent burst; write_expire=15s lets the scheduler batch
# writes for coalescence (torrent writes are async and tolerate delay).
# The bulk of read coalescence already happens above the scheduler via ZFS
# aggregation (zfs_vdev_aggregation_limit=4M, read_gap_limit=128K,
# async_read_max=32), so the scheduler deadline only needs to be large enough
# to keep the elevator sweep coherent -- 500ms is plenty on rotational disks.
# fifo_batch=128 keeps sweeps long; writes_starved=16 heavily favors reads.
# 4 MiB readahead matches libtorrent piece extent affinity for sequential prefetch.
#
# This runs as a systemd oneshot rather than udev rules because the NixOS ZFS module
# hardcodes a udev rule that forces scheduler="none" on all ZFS member partitions'
# parent disks, overriding any scheduler set via udev on the disk event.
systemd.services.hdd-io-tuning = {
description = "HDD I/O scheduler and queue tuning";
after = [
"zfs-import.target"
"systemd-udev-settle.service"
];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
};
path = with pkgs; [
coreutils
gawk
zfs
];
script = ''
# Only tune disks in the hdds pool not all rotational disks.
# zpool status gives by-id device names; we resolve to /sys/block/<name>.
zpool status hdds | awk '/^\t/ && $1 ~ /^(ata-|nvme-|scsi-)/ {print $1}' | while read -r id; do
link="/dev/disk/by-id/$id"
[ -L "$link" ] || continue
name=$(basename "$(readlink -f "$link")")
dev="/sys/block/$name"
[ -d "$dev" ] || continue
echo mq-deadline > "$dev/queue/scheduler"
echo 4096 > "$dev/queue/read_ahead_kb"
echo 512 > "$dev/queue/nr_requests"
echo 15000 > "$dev/queue/iosched/read_expire"
echo 15000 > "$dev/queue/iosched/write_expire"
echo 128 > "$dev/queue/iosched/fifo_batch"
echo 16 > "$dev/queue/iosched/writes_starved"
echo 4096 > "$dev/queue/max_sectors_kb" 2>/dev/null || true
echo "Tuned $id -> $name: mq-deadline, 4M readahead, 15s deadlines"
done
'';
};
# The NixOS ZFS module hardcodes a udev rule that forces scheduler="none" on all
# ZFS member partitions' parent disks (on both add AND change events). We counter
# it with lib.mkAfter so our rule appears after theirs in 99-local.rules — our
# rule matches the same partition events and sets mq-deadline back, then a RUN
# script applies the iosched params. Only targets rotational, non-removable disks
# (i.e. HDDs, not SSDs or USB).
services.udev.extraRules = lib.mkAfter ''
ACTION=="add|change", KERNEL=="sd[a-z]*[0-9]*", ENV{ID_FS_TYPE}=="zfs_member", ATTR{../queue/rotational}=="1", ATTR{../removable}=="0", ATTR{../queue/scheduler}="mq-deadline", ATTR{../queue/read_ahead_kb}="4096", ATTR{../queue/nr_requests}="512", RUN+="${hddTuneIosched} %k"
'';
}

View File

@@ -24,6 +24,7 @@
# ZFS cache directory - persisting the directory instead of the file
# avoids "device busy" errors when ZFS atomically updates the cache
"/etc/zfs"
"/var/lib/gitea-runner"
];
files = [

View File

@@ -10,20 +10,16 @@ inputs.nixpkgs.lib.extend (
lib = prev;
in
{
# stolen from: https://stackoverflow.com/a/42398526
optimizeWithFlags =
pkg: flags:
lib.overrideDerivation pkg (
old:
let
newflags = lib.foldl' (acc: x: "${acc} ${x}") "" flags;
oldflags = if (lib.hasAttr "NIX_CFLAGS_COMPILE" old) then "${old.NIX_CFLAGS_COMPILE}" else "";
in
{
NIX_CFLAGS_COMPILE = "${oldflags} ${newflags}";
# stdenv = pkgs.clang19Stdenv;
}
);
pkg.overrideAttrs (old: {
env = (old.env or { }) // {
NIX_CFLAGS_COMPILE =
(old.env.NIX_CFLAGS_COMPILE or old.NIX_CFLAGS_COMPILE or "")
+ " "
+ (lib.concatStringsSep " " flags);
};
});
optimizePackage =
pkg:
@@ -63,8 +59,12 @@ inputs.nixpkgs.lib.extend (
{ pkgs, config, ... }:
{
systemd.services."${serviceName}-mounts" = {
wants = [ "zfs.target" ] ++ lib.optionals (zpool != "") [ "zfs-import-${zpool}.service" ];
after = lib.optionals (zpool != "") [ "zfs-import-${zpool}.service" ];
wants = [
"zfs.target"
"zfs-mount.service"
]
++ lib.optionals (zpool != "") [ "zfs-import-${zpool}.service" ];
after = [ "zfs-mount.service" ] ++ lib.optionals (zpool != "") [ "zfs-import-${zpool}.service" ];
before = [ "${serviceName}.service" ];
serviceConfig = {
@@ -180,5 +180,108 @@ inputs.nixpkgs.lib.extend (
after = [ "${serviceName}-file-perms.service" ];
};
};
# Creates a Caddy virtualHost with reverse_proxy to a local or VPN-namespaced port.
# Use `subdomain` for "<name>.${domain}" or `domain` for a full custom domain.
# Exactly one of `subdomain` or `domain` must be provided.
mkCaddyReverseProxy =
{
subdomain ? null,
domain ? null,
port,
auth ? false,
vpn ? false,
}:
assert (subdomain != null) != (domain != null);
{ config, ... }:
let
vhostDomain = if domain != null then domain else "${subdomain}.${service_configs.https.domain}";
upstream =
if vpn then
"${config.vpnNamespaces.wg.namespaceAddress}:${builtins.toString port}"
else
":${builtins.toString port}";
in
{
services.caddy.virtualHosts."${vhostDomain}".extraConfig = lib.concatStringsSep "\n" (
lib.optional auth "import ${config.age.secrets.caddy_auth.path}" ++ [ "reverse_proxy ${upstream}" ]
);
};
# Creates a fail2ban jail with systemd journal backend.
# Covers the common pattern: journal-based detection, http/https ports, default thresholds.
mkFail2banJail =
{
name,
unitName ? "${name}.service",
failregex,
}:
{ ... }:
{
services.fail2ban.jails.${name} = {
enabled = true;
settings = {
backend = "systemd";
port = "http,https";
# defaults: maxretry=5, findtime=10m, bantime=10m
};
filter.Definition = {
inherit failregex;
ignoreregex = "";
journalmatch = "_SYSTEMD_UNIT=${unitName}";
};
};
};
# Creates a hardened Grafana annotation daemon service.
# Provides DynamicUser, sandboxing, state directory, and GRAFANA_URL/STATE_FILE automatically.
mkGrafanaAnnotationService =
{
name,
description,
script,
after ? [ ],
environment ? { },
loadCredential ? null,
}:
{
systemd.services."${name}-annotations" = {
inherit description;
after = [
"network.target"
"grafana.service"
]
++ after;
wantedBy = [ "multi-user.target" ];
serviceConfig = {
ExecStart = "${pkgs.python3}/bin/python3 ${script}";
Restart = "always";
RestartSec = "10s";
DynamicUser = true;
StateDirectory = "${name}-annotations";
NoNewPrivileges = true;
ProtectSystem = "strict";
ProtectHome = true;
PrivateTmp = true;
RestrictAddressFamilies = [
"AF_INET"
"AF_INET6"
];
MemoryDenyWriteExecute = true;
}
// lib.optionalAttrs (loadCredential != null) {
LoadCredential = loadCredential;
};
environment = {
GRAFANA_URL = "http://127.0.0.1:${toString service_configs.ports.private.grafana.port}";
STATE_FILE = "/var/lib/${name}-annotations/state.json";
}
// environment;
};
};
# Shell command to extract an API key from an *arr config.xml file.
# Returns a string suitable for $() command substitution in shell scripts.
extractArrApiKey =
configXmlPath: "${lib.getExe pkgs.gnugrep} -oP '(?<=<ApiKey>)[^<]+' ${configXmlPath}";
}
)

View File

@@ -43,4 +43,36 @@ final: prev: {
}
);
};
jellyfin-exporter = prev.buildGoModule rec {
pname = "jellyfin-exporter";
version = "unstable-2025-03-27";
src = prev.fetchFromGitHub {
owner = "rebelcore";
repo = "jellyfin_exporter";
rev = "8e3970cb1bdf3cb21fac099c13072bb7c1b20cf9";
hash = "sha256-wDnhepYj1MyLRZlwKfmwf4xiEEL3mgQY6V+7TnBd0MY=";
};
vendorHash = "sha256-e08u10e/wNapNZSsD/fGVN9ybMHe3sW0yDIOqI8ZcYs=";
# upstream tests require a running Jellyfin instance
doCheck = false;
meta.mainProgram = "jellyfin_exporter";
};
igpu-exporter = prev.buildGoModule rec {
pname = "igpu-exporter";
version = "unstable-2025-03-27";
src = prev.fetchFromGitHub {
owner = "mike1808";
repo = "igpu-exporter";
rev = "db2dace1a895c2b950f6d3ba1a2e46729251d124";
hash = "sha256-xWTiu26UzTZIK/6jeda+x6VePUgoWTS0AekejFdgFWs=";
};
vendorHash = "sha256-oeCSKwDKVwvYQ1fjXXTwQSXNl/upDE3WAAk680vqh3U=";
subPackages = [ "cmd" ];
postInstall = ''
mv $out/bin/cmd $out/bin/igpu-exporter
'';
meta.mainProgram = "igpu-exporter";
};
}

41
modules/power.nix Normal file
View File

@@ -0,0 +1,41 @@
{
...
}:
{
powerManagement = {
enable = true;
cpuFreqGovernor = "powersave";
};
# Always-on server: disable all sleep targets.
systemd.targets = {
sleep.enable = false;
suspend.enable = false;
hibernate.enable = false;
hybrid-sleep.enable = false;
};
boot.kernelParams = [
# Disable NMI watchdog at boot. Eliminates periodic perf-counter interrupts
# across all cores (~1 W). Safe: apcupsd provides hardware hang detection
# via UPS, and softlockup watchdog remains active.
"nmi_watchdog=0"
# Route kernel work items to already-busy CPUs rather than waking idle ones.
# Reduces C-state exit frequency at the cost of slightly higher latency on
# work items -- irrelevant for a server whose latency-sensitive paths are
# all in userspace (caddy, jellyfin).
"workqueue.power_efficient=1"
];
boot.kernel.sysctl = {
# Belt-and-suspenders: also set via boot param, but sysctl ensures it
# stays off if anything re-enables it at runtime.
"kernel.nmi_watchdog" = 0;
};
# Server has no audio consumers. Power-gate the HDA codec at module load.
boot.extraModprobeConfig = ''
options snd_hda_intel power_save=1 power_save_controller=Y
'';
}

View File

@@ -13,6 +13,89 @@
# disable coredumps
systemd.coredump.enable = false;
# Needed for Nix sandbox UID/GID mapping inside derivation builds.
# See https://github.com/NixOS/nixpkgs/issues/287194
security.unprivilegedUsernsClone = true;
# Disable kexec to prevent replacing the running kernel at runtime.
security.protectKernelImage = true;
# Kernel hardening boot parameters. These recover most of the runtime-
# configurable protections that the linux-hardened patchset provided.
boot.kernelParams = [
# Zero all page allocator pages on free / alloc. Prevents info leaks
# and use-after-free from seeing stale data. Modest CPU overhead.
"init_on_alloc=1"
"init_on_free=1"
# Prevent SLUB allocator from merging caches with similar size/flags.
# Keeps different kernel object types in separate slabs, making heap
# exploitation (type confusion, spray, use-after-free) significantly harder.
"slab_nomerge"
# Randomize order of pages returned by the buddy allocator.
"page_alloc.shuffle=1"
# Disable debugfs entirely (exposes kernel internals).
"debugfs=off"
# Disable legacy vsyscall emulation (unused by any modern glibc).
"vsyscall=none"
# Strict IOMMU TLB invalidation (no batching). Prevents DMA-capable
# devices from accessing stale mappings after unmap.
"iommu.strict=1"
];
boot.kernel.sysctl = {
# Immediately reboot on kernel oops (don't leave a compromised
# kernel running). Negative value = reboot without delay.
"kernel.panic" = -1;
# Hide kernel pointers from all processes, including CAP_SYSLOG.
# Prevents info leaks used to defeat KASLR.
"kernel.kptr_restrict" = 2;
# Disable bpf() JIT compiler (eliminates JIT spray attack vector).
"net.core.bpf_jit_enable" = false;
# Disable ftrace (kernel function tracer) at runtime.
"kernel.ftrace_enabled" = false;
# Strict reverse-path filtering: drop packets arriving on an interface
# where the source address isn't routable back via that interface.
"net.ipv4.conf.all.rp_filter" = 1;
"net.ipv4.conf.default.rp_filter" = 1;
"net.ipv4.conf.all.log_martians" = true;
"net.ipv4.conf.default.log_martians" = true;
# Ignore ICMP redirects (prevents route table poisoning).
"net.ipv4.conf.all.accept_redirects" = false;
"net.ipv4.conf.all.secure_redirects" = false;
"net.ipv4.conf.default.accept_redirects" = false;
"net.ipv4.conf.default.secure_redirects" = false;
"net.ipv6.conf.all.accept_redirects" = false;
"net.ipv6.conf.default.accept_redirects" = false;
# Don't send ICMP redirects (we are not a router).
"net.ipv4.conf.all.send_redirects" = false;
"net.ipv4.conf.default.send_redirects" = false;
# Ignore broadcast ICMP (SMURF amplification mitigation).
"net.ipv4.icmp_echo_ignore_broadcasts" = true;
# Filesystem hardening: prevent hardlink/symlink-based attacks.
# protected_hardlinks/symlinks: block unprivileged creation of hard/symlinks
# to files the user doesn't own (prevents TOCTOU privilege escalation).
# protected_fifos/regular (level 2): restrict opening FIFOs and regular files
# in world-writable sticky directories to owner/group match only.
# Also required for systemd-tmpfiles to chmod hardlinked files.
"fs.protected_hardlinks" = true;
"fs.protected_symlinks" = true;
"fs.protected_fifos" = 2;
"fs.protected_regular" = 2;
};
services = {
dbus.implementation = "broker";
/*

View File

@@ -1,15 +1,39 @@
{
config,
lib,
service_configs,
pkgs,
...
}:
let
# Total RAM in bytes (from /proc/meminfo: 65775836 KiB).
totalRamBytes = 65775836 * 1024;
# Hugepage reservations that the kernel carves out before ZFS can use them.
hugepages2mBytes = service_configs.hugepages_2m.total_pages * 2 * 1024 * 1024;
hugepages1gBytes = 3 * 1024 * 1024 * 1024; # 3x 1G pages for RandomX (xmrig.nix)
totalHugepageBytes = hugepages2mBytes + hugepages1gBytes;
# ARC max: 60% of RAM remaining after hugepages. Leaves headroom for
# application RSS (PostgreSQL, qBittorrent, Jellyfin, Grafana, etc.),
# kernel slabs, and page cache.
arcMaxBytes = (totalRamBytes - totalHugepageBytes) * 60 / 100;
in
{
boot.zfs.package = pkgs.zfs;
boot.zfs.package = pkgs.zfs_2_4;
boot.initrd.kernelModules = [ "zfs" ];
boot.kernelParams = [
"zfs.zfs_txg_timeout=120" # longer TXG open time = larger sequential writes
# 120s TXG timeout: batch more dirty data per transaction group so the
# HDD pool (hdds) writes larger, sequential I/Os instead of many small syncs.
# This is a global setting (no per-pool control); the SSD pool (tank) syncs
# infrequently but handles it fine since SSDs don't suffer from seek overhead.
"zfs.zfs_txg_timeout=120"
# Cap ARC to prevent it from claiming memory reserved for hugepages.
# Without this, ZFS auto-sizes c_max to ~62 GiB on a 64 GiB system,
# ignoring the 11.5 GiB of hugepage reservations.
"zfs.zfs_arc_max=${toString arcMaxBytes}"
# vdev I/O scheduler: feed more concurrent reads to the block scheduler so
# mq-deadline has a larger pool of requests to sort and merge into elevator sweeps.

View File

@@ -0,0 +1,443 @@
From f0582558f0a8b0ef543b3251c4a07afab89fde63 Mon Sep 17 00:00:00 2001
From: Simon Gardling <titaniumtown@proton.me>
Date: Fri, 17 Apr 2026 19:37:11 -0400
Subject: [PATCH] nixos/jellyfin: add declarative network.xml options
Adds services.jellyfin.network.* (baseUrl, ports, IPv4/6, LAN subnets,
known proxies, remote IP filter, etc.) and services.jellyfin.forceNetworkConfig,
mirroring the existing hardwareAcceleration / forceEncodingConfig pattern.
Motivation: running Jellyfin behind a reverse proxy requires configuring
KnownProxies (so the real client IP is extracted from X-Forwarded-For)
and LocalNetworkSubnets (so LAN clients are correctly classified and not
subject to RemoteClientBitrateLimit). These settings previously had no
declarative option -- they could only be set via the web dashboard or
by hand-editing network.xml, with no guarantee they would survive a
reinstall or be consistent across deployments.
Implementation:
- Adds a networkXmlText template alongside the existing encodingXmlText.
- Factors the force-vs-soft install logic out of preStart into a
small 'manage_config_xml' shell helper; encoding.xml and network.xml
now share the same install/backup semantics.
- Extends the VM test with a machineWithNetworkConfig node and a
subtest that verifies the declared values land in network.xml,
Jellyfin parses them at startup, and the backup-on-overwrite path
works (same shape as the existing 'Force encoding config' subtest).
---
nixos/modules/services/misc/jellyfin.nix | 303 ++++++++++++++++++++---
nixos/tests/jellyfin.nix | 50 ++++
2 files changed, 317 insertions(+), 36 deletions(-)
diff --git a/nixos/modules/services/misc/jellyfin.nix b/nixos/modules/services/misc/jellyfin.nix
index 5c08fc478e45..387da907c652 100644
--- a/nixos/modules/services/misc/jellyfin.nix
+++ b/nixos/modules/services/misc/jellyfin.nix
@@ -26,8 +26,10 @@ let
bool
enum
ints
+ listOf
nullOr
path
+ port
str
submodule
;
@@ -68,6 +70,41 @@ let
</EncodingOptions>
'';
encodingXmlFile = pkgs.writeText "encoding.xml" encodingXmlText;
+ stringListToXml =
+ tag: items:
+ if items == [ ] then
+ "<${tag} />"
+ else
+ "<${tag}>\n ${
+ concatMapStringsSep "\n " (item: "<string>${escapeXML item}</string>") items
+ }\n </${tag}>";
+ networkXmlText = ''
+ <?xml version="1.0" encoding="utf-8"?>
+ <NetworkConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
+ <BaseUrl>${escapeXML cfg.network.baseUrl}</BaseUrl>
+ <EnableHttps>${boolToString cfg.network.enableHttps}</EnableHttps>
+ <RequireHttps>${boolToString cfg.network.requireHttps}</RequireHttps>
+ <InternalHttpPort>${toString cfg.network.internalHttpPort}</InternalHttpPort>
+ <InternalHttpsPort>${toString cfg.network.internalHttpsPort}</InternalHttpsPort>
+ <PublicHttpPort>${toString cfg.network.publicHttpPort}</PublicHttpPort>
+ <PublicHttpsPort>${toString cfg.network.publicHttpsPort}</PublicHttpsPort>
+ <AutoDiscovery>${boolToString cfg.network.autoDiscovery}</AutoDiscovery>
+ <EnableUPnP>${boolToString cfg.network.enableUPnP}</EnableUPnP>
+ <EnableIPv4>${boolToString cfg.network.enableIPv4}</EnableIPv4>
+ <EnableIPv6>${boolToString cfg.network.enableIPv6}</EnableIPv6>
+ <EnableRemoteAccess>${boolToString cfg.network.enableRemoteAccess}</EnableRemoteAccess>
+ ${stringListToXml "LocalNetworkSubnets" cfg.network.localNetworkSubnets}
+ ${stringListToXml "LocalNetworkAddresses" cfg.network.localNetworkAddresses}
+ ${stringListToXml "KnownProxies" cfg.network.knownProxies}
+ <IgnoreVirtualInterfaces>${boolToString cfg.network.ignoreVirtualInterfaces}</IgnoreVirtualInterfaces>
+ ${stringListToXml "VirtualInterfaceNames" cfg.network.virtualInterfaceNames}
+ <EnablePublishedServerUriByRequest>${boolToString cfg.network.enablePublishedServerUriByRequest}</EnablePublishedServerUriByRequest>
+ ${stringListToXml "PublishedServerUriBySubnet" cfg.network.publishedServerUriBySubnet}
+ ${stringListToXml "RemoteIPFilter" cfg.network.remoteIPFilter}
+ <IsRemoteIPFilterBlacklist>${boolToString cfg.network.isRemoteIPFilterBlacklist}</IsRemoteIPFilterBlacklist>
+ </NetworkConfiguration>
+ '';
+ networkXmlFile = pkgs.writeText "network.xml" networkXmlText;
codecListToType =
desc: list:
submodule {
@@ -205,6 +242,196 @@ in
'';
};
+ network = {
+ baseUrl = mkOption {
+ type = str;
+ default = "";
+ example = "/jellyfin";
+ description = ''
+ Prefix added to Jellyfin's internal URLs when it sits behind a reverse proxy at a sub-path.
+ Leave empty when Jellyfin is served at the root of its host.
+ '';
+ };
+
+ enableHttps = mkOption {
+ type = bool;
+ default = false;
+ description = ''
+ Serve HTTPS directly from Jellyfin. Usually unnecessary when terminating TLS in a reverse proxy.
+ '';
+ };
+
+ requireHttps = mkOption {
+ type = bool;
+ default = false;
+ description = ''
+ Redirect plaintext HTTP requests to HTTPS. Only meaningful when {option}`enableHttps` is true.
+ '';
+ };
+
+ internalHttpPort = mkOption {
+ type = port;
+ default = 8096;
+ description = "TCP port Jellyfin binds for HTTP.";
+ };
+
+ internalHttpsPort = mkOption {
+ type = port;
+ default = 8920;
+ description = "TCP port Jellyfin binds for HTTPS. Only used when {option}`enableHttps` is true.";
+ };
+
+ publicHttpPort = mkOption {
+ type = port;
+ default = 8096;
+ description = "HTTP port Jellyfin advertises in server discovery responses and published URIs.";
+ };
+
+ publicHttpsPort = mkOption {
+ type = port;
+ default = 8920;
+ description = "HTTPS port Jellyfin advertises in server discovery responses and published URIs.";
+ };
+
+ autoDiscovery = mkOption {
+ type = bool;
+ default = true;
+ description = "Respond to LAN client auto-discovery broadcasts (UDP 7359).";
+ };
+
+ enableUPnP = mkOption {
+ type = bool;
+ default = false;
+ description = "Attempt to open the public ports on the router via UPnP.";
+ };
+
+ enableIPv4 = mkOption {
+ type = bool;
+ default = true;
+ description = "Listen on IPv4.";
+ };
+
+ enableIPv6 = mkOption {
+ type = bool;
+ default = true;
+ description = "Listen on IPv6.";
+ };
+
+ enableRemoteAccess = mkOption {
+ type = bool;
+ default = true;
+ description = ''
+ Allow connections from clients outside the subnets listed in {option}`localNetworkSubnets`.
+ When false, Jellyfin rejects non-local requests regardless of reverse proxy configuration.
+ '';
+ };
+
+ localNetworkSubnets = mkOption {
+ type = listOf str;
+ default = [ ];
+ example = [
+ "192.168.1.0/24"
+ "10.0.0.0/8"
+ ];
+ description = ''
+ CIDR ranges (or bare IPs) that Jellyfin classifies as the local network.
+ Clients originating from these ranges -- as seen after {option}`knownProxies` X-Forwarded-For
+ unwrapping -- are not subject to {option}`services.jellyfin` remote-client bitrate limits.
+ '';
+ };
+
+ localNetworkAddresses = mkOption {
+ type = listOf str;
+ default = [ ];
+ example = [ "192.168.1.50" ];
+ description = ''
+ Specific interface addresses Jellyfin binds to. Leave empty to bind all interfaces.
+ '';
+ };
+
+ knownProxies = mkOption {
+ type = listOf str;
+ default = [ ];
+ example = [ "127.0.0.1" ];
+ description = ''
+ Addresses of reverse proxies trusted to forward the real client IP via `X-Forwarded-For`.
+ Without this, Jellyfin sees the proxy's address for every request and cannot apply
+ {option}`localNetworkSubnets` classification to the true client.
+ '';
+ };
+
+ ignoreVirtualInterfaces = mkOption {
+ type = bool;
+ default = true;
+ description = "Skip virtual network interfaces (matching {option}`virtualInterfaceNames`) during auto-bind.";
+ };
+
+ virtualInterfaceNames = mkOption {
+ type = listOf str;
+ default = [ "veth" ];
+ description = "Interface name prefixes treated as virtual when {option}`ignoreVirtualInterfaces` is true.";
+ };
+
+ enablePublishedServerUriByRequest = mkOption {
+ type = bool;
+ default = false;
+ description = ''
+ Derive the server's public URI from the incoming request's Host header instead of any
+ configured {option}`publishedServerUriBySubnet` entry.
+ '';
+ };
+
+ publishedServerUriBySubnet = mkOption {
+ type = listOf str;
+ default = [ ];
+ example = [ "192.168.1.0/24=http://jellyfin.lan:8096" ];
+ description = ''
+ Per-subnet overrides for the URI Jellyfin advertises to clients, in `subnet=uri` form.
+ '';
+ };
+
+ remoteIPFilter = mkOption {
+ type = listOf str;
+ default = [ ];
+ example = [ "203.0.113.0/24" ];
+ description = ''
+ IPs or CIDRs used as the allow- or denylist for remote access.
+ Behaviour is controlled by {option}`isRemoteIPFilterBlacklist`.
+ '';
+ };
+
+ isRemoteIPFilterBlacklist = mkOption {
+ type = bool;
+ default = false;
+ description = ''
+ When true, {option}`remoteIPFilter` is a denylist; when false, it is an allowlist
+ (and an empty list allows all remote addresses).
+ '';
+ };
+ };
+
+ forceNetworkConfig = mkOption {
+ type = bool;
+ default = false;
+ description = ''
+ Whether to overwrite Jellyfin's `network.xml` configuration file on each service start.
+
+ When enabled, the network configuration specified in {option}`services.jellyfin.network`
+ is applied on every service restart. A backup of the existing `network.xml` will be
+ created at `network.xml.backup-$timestamp`.
+
+ ::: {.warning}
+ Enabling this option means that any changes made to networking settings through
+ Jellyfin's web dashboard will be lost on the next service restart. The NixOS configuration
+ becomes the single source of truth for network settings.
+ :::
+
+ When disabled (the default), the network configuration is only written if no `network.xml`
+ exists yet. This allows settings to be changed through Jellyfin's web dashboard and persist
+ across restarts, but means the NixOS configuration options will be ignored after the initial setup.
+ '';
+ };
+
transcoding = {
maxConcurrentStreams = mkOption {
type = nullOr ints.positive;
@@ -384,46 +611,50 @@ in
wants = [ "network-online.target" ];
wantedBy = [ "multi-user.target" ];
- preStart = mkIf cfg.hardwareAcceleration.enable (
- ''
- configDir=${escapeShellArg cfg.configDir}
- encodingXml="$configDir/encoding.xml"
- ''
- + (
- if cfg.forceEncodingConfig then
- ''
- if [[ -e $encodingXml ]]; then
+ preStart =
+ let
+ # manage_config_xml <source> <destination> <force> <description>
+ #
+ # Installs a NixOS-declared XML config at <destination>, preserving
+ # any existing file as a timestamped backup when <force> is true.
+ # With <force>=false, leaves existing files untouched and warns if
+ # the on-disk content differs from the declared content.
+ helper = ''
+ manage_config_xml() {
+ local src="$1" dest="$2" force="$3" desc="$4"
+ if [[ -e "$dest" ]]; then
# this intentionally removes trailing newlines
- currentText="$(<"$encodingXml")"
- configuredText="$(<${encodingXmlFile})"
- if [[ $currentText == "$configuredText" ]]; then
- # don't need to do anything
- exit 0
- else
- encodingXmlBackup="$configDir/encoding.xml.backup-$(date -u +"%FT%H_%M_%SZ")"
- mv --update=none-fail -T "$encodingXml" "$encodingXmlBackup"
+ local currentText configuredText
+ currentText="$(<"$dest")"
+ configuredText="$(<"$src")"
+ if [[ "$currentText" == "$configuredText" ]]; then
+ return 0
fi
- fi
- cp --update=none-fail -T ${encodingXmlFile} "$encodingXml"
- chmod u+w "$encodingXml"
- ''
- else
- ''
- if [[ -e $encodingXml ]]; then
- # this intentionally removes trailing newlines
- currentText="$(<"$encodingXml")"
- configuredText="$(<${encodingXmlFile})"
- if [[ $currentText != "$configuredText" ]]; then
- echo "WARN: $encodingXml already exists and is different from the configured settings. transcoding options NOT applied." >&2
- echo "WARN: Set config.services.jellyfin.forceEncodingConfig = true to override." >&2
+ if [[ "$force" == true ]]; then
+ local backup
+ backup="$dest.backup-$(date -u +"%FT%H_%M_%SZ")"
+ mv --update=none-fail -T "$dest" "$backup"
+ else
+ echo "WARN: $dest already exists and is different from the configured settings. $desc options NOT applied." >&2
+ echo "WARN: Set the corresponding force*Config option to override." >&2
+ return 0
fi
- else
- cp --update=none-fail -T ${encodingXmlFile} "$encodingXml"
- chmod u+w "$encodingXml"
fi
- ''
- )
- );
+ cp --update=none-fail -T "$src" "$dest"
+ chmod u+w "$dest"
+ }
+ configDir=${escapeShellArg cfg.configDir}
+ '';
+ in
+ (
+ helper
+ + optionalString cfg.hardwareAcceleration.enable ''
+ manage_config_xml ${encodingXmlFile} "$configDir/encoding.xml" ${boolToString cfg.forceEncodingConfig} transcoding
+ ''
+ + ''
+ manage_config_xml ${networkXmlFile} "$configDir/network.xml" ${boolToString cfg.forceNetworkConfig} network
+ ''
+ );
# This is mostly follows: https://github.com/jellyfin/jellyfin/blob/master/fedora/jellyfin.service
# Upstream also disable some hardenings when running in LXC, we do the same with the isContainer option
diff --git a/nixos/tests/jellyfin.nix b/nixos/tests/jellyfin.nix
index 4896c13d4eca..0c9191960f78 100644
--- a/nixos/tests/jellyfin.nix
+++ b/nixos/tests/jellyfin.nix
@@ -63,6 +63,26 @@
environment.systemPackages = with pkgs; [ ffmpeg ];
virtualisation.diskSize = 3 * 1024;
};
+
+ machineWithNetworkConfig = {
+ services.jellyfin = {
+ enable = true;
+ forceNetworkConfig = true;
+ network = {
+ localNetworkSubnets = [
+ "192.168.1.0/24"
+ "10.0.0.0/8"
+ ];
+ knownProxies = [ "127.0.0.1" ];
+ enableUPnP = false;
+ enableIPv6 = false;
+ remoteIPFilter = [ "203.0.113.5" ];
+ isRemoteIPFilterBlacklist = true;
+ };
+ };
+ environment.systemPackages = with pkgs; [ ffmpeg ];
+ virtualisation.diskSize = 3 * 1024;
+ };
};
# Documentation of the Jellyfin API: https://api.jellyfin.org/
@@ -122,6 +142,36 @@
# Verify the new encoding.xml does not have the marker (was overwritten)
machineWithForceConfig.fail("grep -q 'MARKER' /var/lib/jellyfin/config/encoding.xml")
+ # Test forceNetworkConfig and network.xml generation
+ with subtest("Force network config writes declared values and backs up on overwrite"):
+ wait_for_jellyfin(machineWithNetworkConfig)
+
+ # Verify network.xml exists and contains the declared values
+ machineWithNetworkConfig.succeed("test -f /var/lib/jellyfin/config/network.xml")
+ machineWithNetworkConfig.succeed("grep -F '<string>192.168.1.0/24</string>' /var/lib/jellyfin/config/network.xml")
+ machineWithNetworkConfig.succeed("grep -F '<string>10.0.0.0/8</string>' /var/lib/jellyfin/config/network.xml")
+ machineWithNetworkConfig.succeed("grep -F '<string>127.0.0.1</string>' /var/lib/jellyfin/config/network.xml")
+ machineWithNetworkConfig.succeed("grep -F '<string>203.0.113.5</string>' /var/lib/jellyfin/config/network.xml")
+ machineWithNetworkConfig.succeed("grep -F '<IsRemoteIPFilterBlacklist>true</IsRemoteIPFilterBlacklist>' /var/lib/jellyfin/config/network.xml")
+ machineWithNetworkConfig.succeed("grep -F '<EnableIPv6>false</EnableIPv6>' /var/lib/jellyfin/config/network.xml")
+ machineWithNetworkConfig.succeed("grep -F '<EnableUPnP>false</EnableUPnP>' /var/lib/jellyfin/config/network.xml")
+
+ # Stop service before modifying config
+ machineWithNetworkConfig.succeed("systemctl stop jellyfin.service")
+
+ # Plant a marker so we can prove the backup-and-overwrite path runs
+ machineWithNetworkConfig.succeed("echo '<!-- NETMARKER -->' > /var/lib/jellyfin/config/network.xml")
+
+ # Restart the service to trigger the backup
+ machineWithNetworkConfig.succeed("systemctl restart jellyfin.service")
+ wait_for_jellyfin(machineWithNetworkConfig)
+
+ # Verify the marked content was preserved as a timestamped backup
+ machineWithNetworkConfig.succeed("grep -q 'NETMARKER' /var/lib/jellyfin/config/network.xml.backup-*")
+
+ # Verify the new network.xml does not have the marker (was overwritten)
+ machineWithNetworkConfig.fail("grep -q 'NETMARKER' /var/lib/jellyfin/config/network.xml")
+
auth_header = 'MediaBrowser Client="NixOS Integration Tests", DeviceId="1337", Device="Apple II", Version="20.09"'
--
2.53.0

BIN
secrets/ci-deploy-key.age Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
secrets/nix-cache-auth.age Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -81,6 +81,12 @@ rec {
port = 6011;
proto = "tcp";
};
# Webhook receiver for the Jellyfin-qBittorrent monitor — Jellyfin pushes
# playback events here so throttling reacts without waiting for the poll.
jellyfin_qbittorrent_monitor_webhook = {
port = 9898;
proto = "tcp";
};
bitmagnet = {
port = 3333;
proto = "tcp";
@@ -153,6 +159,50 @@ rec {
port = 8020;
proto = "tcp";
};
grafana = {
port = 3000;
proto = "tcp";
};
prometheus = {
port = 9090;
proto = "tcp";
};
prometheus_node = {
port = 9100;
proto = "tcp";
};
prometheus_apcupsd = {
port = 9162;
proto = "tcp";
};
llama_cpp = {
port = 6688;
proto = "tcp";
};
trilium = {
port = 8787;
proto = "tcp";
};
jellyfin_exporter = {
port = 9594;
proto = "tcp";
};
qbittorrent_exporter = {
port = 9561;
proto = "tcp";
};
igpu_exporter = {
port = 9563;
proto = "tcp";
};
prometheus_zfs = {
port = 9134;
proto = "tcp";
};
harmonia = {
port = 5500;
proto = "tcp";
};
};
};
@@ -189,6 +239,17 @@ rec {
torrent = {
SavePath = torrents_path;
TempPath = torrents_path + "/incomplete";
categories = {
anime = torrents_path + "/anime";
archive = torrents_path + "/archive";
audiobooks = torrents_path + "/audiobooks";
books = torrents_path + "/books";
games = torrents_path + "/games";
movies = torrents_path + "/movies";
music = torrents_path + "/music";
musicals = torrents_path + "/musicals";
tvshows = torrents_path + "/tvshows";
};
};
jellyfin = {
@@ -212,6 +273,7 @@ rec {
p2pool = {
dataDir = services_dir + "/p2pool";
walletAddress = "49b6NT2k7fQHs8JvF7naUvchYwTQmRpoMMXb1KJTg5UcZVmyPJ7n6jgiH8DrvEsMg5GvMjJqPB1c1PTBAYtUTsbeHe5YMBx";
};
matrix = {
@@ -265,6 +327,15 @@ rec {
domain = "firefox-sync.${https.domain}";
};
grafana = {
dir = services_dir + "/grafana";
domain = "grafana.${https.domain}";
};
trilium = {
dataDir = services_dir + "/trilium";
};
media = {
moviesDir = torrents_path + "/media/movies";
tvDir = torrents_path + "/media/tv";

View File

@@ -1,5 +1,6 @@
{
pkgs,
lib,
service_configs,
...
}:
@@ -12,7 +13,6 @@ let
curl = "${pkgs.curl}/bin/curl";
jq = "${pkgs.jq}/bin/jq";
grep = "${pkgs.gnugrep}/bin/grep";
# Max items to search per cycle per category (missing + cutoff) per app
maxPerCycle = 5;
@@ -20,8 +20,8 @@ let
searchScript = pkgs.writeShellScript "arr-search" ''
set -euo pipefail
RADARR_KEY=$(${grep} -oP '(?<=<ApiKey>)[^<]+' ${radarrConfig})
SONARR_KEY=$(${grep} -oP '(?<=<ApiKey>)[^<]+' ${sonarrConfig})
RADARR_KEY=$(${lib.extractArrApiKey radarrConfig})
SONARR_KEY=$(${lib.extractArrApiKey sonarrConfig})
search_radarr() {
local endpoint="$1"

View File

@@ -16,6 +16,11 @@
(lib.serviceFilePerms "bazarr" [
"Z ${service_configs.bazarr.dataDir} 0700 ${config.services.bazarr.user} ${config.services.bazarr.group}"
])
(lib.mkCaddyReverseProxy {
subdomain = "bazarr";
port = service_configs.ports.private.bazarr.port;
auth = true;
})
];
services.bazarr = {
@@ -23,11 +28,6 @@
listenPort = service_configs.ports.private.bazarr.port;
};
services.caddy.virtualHosts."bazarr.${service_configs.https.domain}".extraConfig = ''
import ${config.age.secrets.caddy_auth.path}
reverse_proxy :${builtins.toString service_configs.ports.private.bazarr.port}
'';
users.users.${config.services.bazarr.user}.extraGroups = [
service_configs.media_group
];

View File

@@ -8,13 +8,26 @@
dataDir = service_configs.prowlarr.dataDir;
apiVersion = "v1";
networkNamespacePath = "/run/netns/wg";
networkNamespaceService = "wg";
# Guarantee critical config.xml elements before startup. Prowlarr has a
# history of losing <Port> from config.xml, causing the service to run
# without binding any socket. See arr-init's configXml for details.
configXml = {
Port = service_configs.ports.private.prowlarr.port;
BindAddress = "*";
EnableSsl = false;
};
# Prowlarr runs in the wg netns; Sonarr/Radarr in the host netns.
# From host netns, Prowlarr is reachable at the wg namespace address,
# not at localhost (which resolves to the host's own netns).
# Health checks can now run — the reverse-connect is reachable.
healthChecks = true;
syncedApps = [
{
name = "Sonarr";
implementation = "Sonarr";
configContract = "SonarrSettings";
prowlarrUrl = "http://localhost:${builtins.toString service_configs.ports.private.prowlarr.port}";
prowlarrUrl = "http://${config.vpnNamespaces.wg.namespaceAddress}:${builtins.toString service_configs.ports.private.prowlarr.port}";
baseUrl = "http://${config.vpnNamespaces.wg.bridgeAddress}:${builtins.toString service_configs.ports.private.sonarr.port}";
apiKeyFrom = "${service_configs.sonarr.dataDir}/config.xml";
serviceName = "sonarr";
@@ -23,7 +36,7 @@
name = "Radarr";
implementation = "Radarr";
configContract = "RadarrSettings";
prowlarrUrl = "http://localhost:${builtins.toString service_configs.ports.private.prowlarr.port}";
prowlarrUrl = "http://${config.vpnNamespaces.wg.namespaceAddress}:${builtins.toString service_configs.ports.private.prowlarr.port}";
baseUrl = "http://${config.vpnNamespaces.wg.bridgeAddress}:${builtins.toString service_configs.ports.private.radarr.port}";
apiKeyFrom = "${service_configs.radarr.dataDir}/config.xml";
serviceName = "radarr";
@@ -37,6 +50,11 @@
port = service_configs.ports.private.sonarr.port;
dataDir = service_configs.sonarr.dataDir;
healthChecks = true;
configXml = {
Port = service_configs.ports.private.sonarr.port;
BindAddress = "*";
EnableSsl = false;
};
rootFolders = [ service_configs.media.tvDir ];
naming = {
renameEpisodes = true;
@@ -69,6 +87,11 @@
port = service_configs.ports.private.radarr.port;
dataDir = service_configs.radarr.dataDir;
healthChecks = true;
configXml = {
Port = service_configs.ports.private.radarr.port;
BindAddress = "*";
EnableSsl = false;
};
rootFolders = [ service_configs.media.moviesDir ];
naming = {
renameMovies = true;
@@ -110,4 +133,21 @@
serviceName = "radarr";
};
};
services.jellyseerrInit = {
enable = true;
configDir = service_configs.jellyseerr.configDir;
radarr = {
profileName = "Remux + WEB 2160p";
dataDir = service_configs.radarr.dataDir;
port = service_configs.ports.private.radarr.port;
serviceName = "radarr";
};
sonarr = {
profileName = "WEB-2160p";
dataDir = service_configs.sonarr.dataDir;
port = service_configs.ports.private.sonarr.port;
serviceName = "sonarr";
};
};
}

View File

@@ -13,6 +13,10 @@
(lib.serviceFilePerms "jellyseerr" [
"Z ${service_configs.jellyseerr.configDir} 0700 jellyseerr jellyseerr"
])
(lib.mkCaddyReverseProxy {
subdomain = "jellyseerr";
port = service_configs.ports.private.jellyseerr.port;
})
];
services.jellyseerr = {
@@ -36,8 +40,4 @@
users.groups.jellyseerr = { };
services.caddy.virtualHosts."jellyseerr.${service_configs.https.domain}".extraConfig = ''
# import ${config.age.secrets.caddy_auth.path}
reverse_proxy :${builtins.toString service_configs.ports.private.jellyseerr.port}
'';
}

View File

@@ -14,6 +14,12 @@
(lib.serviceFilePerms "prowlarr" [
"Z ${service_configs.prowlarr.dataDir} 0700 prowlarr prowlarr"
])
(lib.mkCaddyReverseProxy {
subdomain = "prowlarr";
port = service_configs.ports.private.prowlarr.port;
auth = true;
vpn = true;
})
];
services.prowlarr = {
@@ -32,6 +38,17 @@
};
users.groups.prowlarr = { };
# The upstream prowlarr module hardcodes root:root in tmpfiles for custom dataDirs
# (systemd.tmpfiles.settings."10-prowlarr"), which gets applied by
# systemd-tmpfiles-setup.service on every boot/deploy, resetting the directory
# ownership and making Prowlarr unable to access its SQLite databases.
# Override to use the correct user as we disable DynamicUser
systemd.tmpfiles.settings."10-prowlarr".${service_configs.prowlarr.dataDir}.d = lib.mkForce {
user = "prowlarr";
group = "prowlarr";
mode = "0700";
};
systemd.services.prowlarr.serviceConfig = {
DynamicUser = lib.mkForce false;
User = "prowlarr";
@@ -40,8 +57,4 @@
ExecStart = lib.mkForce "${lib.getExe pkgs.prowlarr} -nobrowser -data=${service_configs.prowlarr.dataDir}";
};
services.caddy.virtualHosts."prowlarr.${service_configs.https.domain}".extraConfig = ''
import ${config.age.secrets.caddy_auth.path}
reverse_proxy ${config.vpnNamespaces.wg.namespaceAddress}:${builtins.toString service_configs.ports.private.prowlarr.port}
'';
}

View File

@@ -16,6 +16,11 @@
(lib.serviceFilePerms "radarr" [
"Z ${service_configs.radarr.dataDir} 0700 ${config.services.radarr.user} ${config.services.radarr.group}"
])
(lib.mkCaddyReverseProxy {
subdomain = "radarr";
port = service_configs.ports.private.radarr.port;
auth = true;
})
];
services.radarr = {
@@ -25,11 +30,6 @@
settings.update.mechanism = "external";
};
services.caddy.virtualHosts."radarr.${service_configs.https.domain}".extraConfig = ''
import ${config.age.secrets.caddy_auth.path}
reverse_proxy :${builtins.toString service_configs.ports.private.radarr.port}
'';
users.users.${config.services.radarr.user}.extraGroups = [
service_configs.media_group
];

View File

@@ -13,8 +13,8 @@ let
# Runs as root (via + prefix) after the NixOS module writes config.json.
# Extracts API keys from radarr/sonarr config.xml and injects them via jq.
injectApiKeys = pkgs.writeShellScript "recyclarr-inject-api-keys" ''
RADARR_KEY=$(${lib.getExe pkgs.gnugrep} -oP '(?<=<ApiKey>)[^<]+' ${radarrConfig})
SONARR_KEY=$(${lib.getExe pkgs.gnugrep} -oP '(?<=<ApiKey>)[^<]+' ${sonarrConfig})
RADARR_KEY=$(${lib.extractArrApiKey radarrConfig})
SONARR_KEY=$(${lib.extractArrApiKey sonarrConfig})
${pkgs.jq}/bin/jq \
--arg rk "$RADARR_KEY" \
--arg sk "$SONARR_KEY" \
@@ -46,50 +46,69 @@ in
radarr.movies = {
base_url = "http://localhost:${builtins.toString service_configs.ports.private.radarr.port}";
# Recyclarr is the sole authority for custom formats and scores.
# Overwrite any manually-created CFs and delete stale ones.
replace_existing_custom_formats = true;
delete_old_custom_formats = true;
include = [
{ template = "radarr-quality-definition-movie"; }
{ template = "radarr-quality-profile-remux-web-2160p"; }
{ template = "radarr-custom-formats-remux-web-2160p"; }
];
# Group WEB 2160p with 1080p in the same quality tier so custom
# format scores -- not quality ranking -- decide the winner.
# Native 4K with HDR/DV from good release groups scores high and
# wins; AI upscales get -10000 from the Upscaled CF and are
# blocked by min_format_score. Untagged upscales from unknown
# groups (score ~0) lose to well-scored 1080p (Tier 01 = +1750).
quality_profiles = [
{
name = "Remux + WEB 2160p";
min_format_score = 0;
reset_unmatched_scores = {
enabled = true;
};
reset_unmatched_scores.enabled = true;
upgrade = {
allowed = true;
until_quality = "Remux-2160p";
until_score = 10000;
};
quality_sort = "top";
qualities = [
{ name = "Remux-2160p"; }
{
name = "WEB 2160p";
name = "WEB/Bluray";
qualities = [
"WEBDL-2160p"
"WEBRip-2160p"
];
}
{ name = "Remux-1080p"; }
{ name = "Bluray-1080p"; }
{
name = "WEB 1080p";
qualities = [
"Remux-1080p"
"Bluray-1080p"
"WEBDL-1080p"
"WEBRip-1080p"
];
}
{ name = "HDTV-1080p"; }
{ name = "Bluray-720p"; }
{
name = "WEB 720p";
qualities = [
"WEBDL-720p"
"WEBRip-720p"
];
}
{ name = "HDTV-720p"; }
];
}
];
custom_formats = [
# Upscaled
# DV (w/o HDR fallback) - block releases with DV that lack HDR10 fallback
{
trash_ids = [ "923b6abef9b17f937fab56cfcf89e1f1" ];
assign_scores_to = [
{ name = "Remux + WEB 2160p"; }
];
}
# Upscaled - block AI upscales and other upscaled-to-2160p releases
{
trash_ids = [ "bfd8eb01832d646a0a89c4deb46f8564" ];
assign_scores_to = [
@@ -99,75 +118,74 @@ in
}
];
}
# x265 (HD) - override template -10000 penalty
{
trash_ids = [ "dc98083864ea246d05a42df0d05f81cc" ];
assign_scores_to = [
{
name = "Remux + WEB 2160p";
score = 0;
}
];
}
# x265 (no HDR/DV) - override template -10000 penalty
{
trash_ids = [ "839bea857ed2c0a8e084f3cbdbd65ecb" ];
assign_scores_to = [
{
name = "Remux + WEB 2160p";
score = 0;
}
];
}
];
};
sonarr.series = {
base_url = "http://localhost:${builtins.toString service_configs.ports.private.sonarr.port}";
# Recyclarr is the sole authority for custom formats and scores.
# Overwrite any manually-created CFs and delete stale ones.
replace_existing_custom_formats = true;
delete_old_custom_formats = true;
include = [
{ template = "sonarr-quality-definition-series"; }
{ template = "sonarr-v4-quality-profile-web-2160p"; }
{ template = "sonarr-v4-custom-formats-web-2160p"; }
];
# Group WEB 2160p with 1080p in the same quality tier so custom
# format scores -- not quality ranking -- decide the winner.
# Native 4K with HDR/DV from good release groups scores high and
# wins; AI upscales get -10000 from the Upscaled CF and are
# blocked by min_format_score. Untagged upscales from unknown
# groups (score ~0) lose to well-scored 1080p (Tier 01 = +1750).
quality_profiles = [
{
name = "WEB-2160p";
min_format_score = 0;
reset_unmatched_scores = {
enabled = true;
};
reset_unmatched_scores.enabled = true;
upgrade = {
allowed = true;
until_quality = "WEB 2160p";
until_quality = "WEB/Bluray";
until_score = 10000;
};
quality_sort = "top";
qualities = [
{
name = "WEB 2160p";
name = "WEB/Bluray";
qualities = [
"WEBDL-2160p"
"WEBRip-2160p"
];
}
{ name = "Bluray-1080p Remux"; }
{ name = "Bluray-1080p"; }
{
name = "WEB 1080p";
qualities = [
"Bluray-1080p Remux"
"Bluray-1080p"
"WEBDL-1080p"
"WEBRip-1080p"
];
}
{ name = "HDTV-1080p"; }
{ name = "Bluray-720p"; }
{
name = "WEB 720p";
qualities = [
"WEBDL-720p"
"WEBRip-720p"
];
}
{ name = "HDTV-720p"; }
];
}
];
custom_formats = [
# Upscaled
# DV (w/o HDR fallback) - block releases with DV that lack HDR10 fallback
{
trash_ids = [ "9b27ab6498ec0f31a3353992e19434ca" ];
assign_scores_to = [
{ name = "WEB-2160p"; }
];
}
# Upscaled - block AI upscales and other upscaled-to-2160p releases
{
trash_ids = [ "23297a736ca77c0fc8e70f8edd7ee56c" ];
assign_scores_to = [
@@ -177,32 +195,23 @@ in
}
];
}
# x265 (HD) - override template -10000 penalty
{
trash_ids = [ "47435ece6b99a0b477caf360e79ba0bb" ];
assign_scores_to = [
{
name = "WEB-2160p";
score = 0;
}
];
}
# x265 (no HDR/DV) - override template -10000 penalty
{
trash_ids = [ "9b64dff695c2115facf1b6ea59c9bd07" ];
assign_scores_to = [
{
name = "WEB-2160p";
score = 0;
}
];
}
];
};
};
};
# Add secrets generation before recyclarr runs
# Trigger immediate sync on deploy when recyclarr config changes.
# restartTriggers on the oneshot service are unreliable (systemd may
# no-op a restart of an inactive oneshot). Instead, embed a config
# hash in the timer unit -- NixOS restarts changed timers reliably,
# and OnActiveSec fires the sync within seconds.
systemd.timers.recyclarr = {
timerConfig.OnActiveSec = "5s";
unitConfig.X-ConfigHash = builtins.hashString "sha256" (
builtins.toJSON config.services.recyclarr.configuration
);
};
systemd.services.recyclarr = {
after = [
"network-online.target"

View File

@@ -16,6 +16,11 @@
(lib.serviceFilePerms "sonarr" [
"Z ${service_configs.sonarr.dataDir} 0700 ${config.services.sonarr.user} ${config.services.sonarr.group}"
])
(lib.mkCaddyReverseProxy {
subdomain = "sonarr";
port = service_configs.ports.private.sonarr.port;
auth = true;
})
];
systemd.tmpfiles.rules = [
@@ -31,11 +36,6 @@
settings.update.mechanism = "external";
};
services.caddy.virtualHosts."sonarr.${service_configs.https.domain}".extraConfig = ''
import ${config.age.secrets.caddy_auth.path}
reverse_proxy :${builtins.toString service_configs.ports.private.sonarr.port}
'';
users.users.${config.services.sonarr.user}.extraGroups = [
service_configs.media_group
];

View File

@@ -2,6 +2,7 @@
pkgs,
config,
service_configs,
lib,
...
}:
{
@@ -34,7 +35,7 @@
RADARR_CONFIG = "${service_configs.radarr.dataDir}/config.xml";
SONARR_URL = "http://localhost:${builtins.toString service_configs.ports.private.sonarr.port}";
SONARR_CONFIG = "${service_configs.sonarr.dataDir}/config.xml";
CATEGORIES = "tvshows,movies,anime";
CATEGORIES = lib.concatStringsSep "," (builtins.attrNames service_configs.torrent.categories);
TAG_TORRENTS = "true";
};
};

View File

@@ -5,9 +5,66 @@
lib,
...
}:
let
prowlarrPort = toString service_configs.ports.private.prowlarr.port;
sonarrPort = toString service_configs.ports.private.sonarr.port;
radarrPort = toString service_configs.ports.private.radarr.port;
bitmagnetPort = toString service_configs.ports.private.bitmagnet.port;
bridgeAddr = config.vpnNamespaces.wg.bridgeAddress;
prowlarrConfigXml = "${service_configs.prowlarr.dataDir}/config.xml";
sonarrConfigXml = "${service_configs.sonarr.dataDir}/config.xml";
radarrConfigXml = "${service_configs.radarr.dataDir}/config.xml";
curl = "${pkgs.curl}/bin/curl";
jq = "${pkgs.jq}/bin/jq";
# Clears the escalating failure backoff for the Bitmagnet indexer across
# Prowlarr, Sonarr, and Radarr so searches resume immediately after
# Bitmagnet restarts instead of waiting hours for disable timers to expire.
recoveryScript = pkgs.writeShellScript "prowlarr-bitmagnet-recovery" ''
set -euo pipefail
wait_for() {
for _ in $(seq 1 "$2"); do
${curl} -sf --max-time 5 "$1" > /dev/null && return 0
sleep 5
done
echo "$1 not reachable, aborting" >&2; exit 1
}
# Test a Bitmagnet-named indexer to clear its failure status.
# A successful test triggers RecordSuccess() which resets the backoff.
clear_status() {
local key indexer
key=$(${lib.extractArrApiKey ''"$3"''}) || return 0
indexer=$(${curl} -sf --max-time 10 \
-H "X-Api-Key: $key" "$2/api/$1/indexer" | \
${jq} 'first(.[] | select(.name | test("Bitmagnet"; "i")))') || return 0
[ -n "$indexer" ] && [ "$indexer" != "null" ] || return 0
${curl} -sf --max-time 30 \
-H "X-Api-Key: $key" -H "Content-Type: application/json" \
-X POST "$2/api/$1/indexer/test" -d "$indexer" > /dev/null
}
wait_for "http://localhost:${bitmagnetPort}" 12
wait_for "http://localhost:${prowlarrPort}/ping" 6
# Prowlarr first downstream apps route searches through it.
clear_status v1 "http://localhost:${prowlarrPort}" "${prowlarrConfigXml}" || true
clear_status v3 "http://${bridgeAddr}:${sonarrPort}" "${sonarrConfigXml}" || true
clear_status v3 "http://${bridgeAddr}:${radarrPort}" "${radarrConfigXml}" || true
'';
in
{
imports = [
(lib.vpnNamespaceOpenPort service_configs.ports.private.bitmagnet.port "bitmagnet")
(lib.mkCaddyReverseProxy {
subdomain = "bitmagnet";
port = service_configs.ports.private.bitmagnet.port;
auth = true;
vpn = true;
})
];
services.bitmagnet = {
@@ -19,13 +76,38 @@
};
http_server = {
# TODO! make issue about this being a string and not a `port` type
port = ":" + (builtins.toString service_configs.ports.private.bitmagnet.port);
port = ":" + (toString service_configs.ports.private.bitmagnet.port);
};
};
};
services.caddy.virtualHosts."bitmagnet.${service_configs.https.domain}".extraConfig = ''
import ${config.age.secrets.caddy_auth.path}
reverse_proxy ${config.vpnNamespaces.wg.namespaceAddress}:${builtins.toString service_configs.ports.private.bitmagnet.port}
'';
# The upstream default (Restart=on-failure) leaves Bitmagnet dead after
# clean exits (e.g. systemd stop during deploy). Always restart it.
systemd.services.bitmagnet.serviceConfig = {
Restart = lib.mkForce "always";
RestartSec = 10;
};
# After Bitmagnet restarts, clear the escalating failure backoff across
# Prowlarr, Sonarr, and Radarr so searches resume immediately instead of
# waiting hours for the disable timers to expire.
systemd.services.prowlarr-bitmagnet-recovery = {
description = "Clear Prowlarr/Sonarr/Radarr failure status for Bitmagnet indexer";
after = [
"bitmagnet.service"
"prowlarr.service"
"sonarr.service"
"radarr.service"
];
bindsTo = [ "bitmagnet.service" ];
wantedBy = [ "bitmagnet.service" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
ExecStart = recoveryScript;
# Same VPN namespace as Bitmagnet and Prowlarr.
NetworkNamespacePath = "/run/netns/wg";
};
};
}

View File

@@ -13,6 +13,10 @@
(lib.serviceFilePerms "vaultwarden" [
"Z ${service_configs.vaultwarden.path} 0700 vaultwarden vaultwarden"
])
(lib.mkFail2banJail {
name = "vaultwarden";
failregex = ''^.*Username or password is incorrect\. Try again\. IP: <HOST>\..*$'';
})
];
services.vaultwarden = {
@@ -38,18 +42,4 @@
}
'';
# Protect Vaultwarden login from brute force attacks
services.fail2ban.jails.vaultwarden = {
enabled = true;
settings = {
backend = "systemd";
port = "http,https";
# defaults: maxretry=5, findtime=10m, bantime=10m
};
filter.Definition = {
failregex = ''^.*Username or password is incorrect\. Try again\. IP: <HOST>\..*$'';
ignoreregex = "";
journalmatch = "_SYSTEMD_UNIT=vaultwarden.service";
};
};
}

View File

@@ -56,9 +56,19 @@ in
enable = true;
email = "titaniumtown@proton.me";
# Enable on-demand TLS for old domain redirects
# Certs are issued dynamically when subdomains are accessed
# Build with Njalla DNS provider for DNS-01 ACME challenges (wildcard certs)
package = pkgs.caddy.withPlugins {
plugins = [ "github.com/caddy-dns/njalla@v0.0.0-20250823094507-f709141f1fe6" ];
hash = "sha256-rrOAR6noTDpV/I/hZXxhz0OXVJKu0mFQRq87RUrpmzw=";
};
globalConfig = ''
# Wildcard cert for *.${newDomain} via DNS-01 challenge
acme_dns njalla {
api_token {env.NJALLA_API_TOKEN}
}
# On-demand TLS for old domain redirects
on_demand_tls {
ask http://localhost:9123/check
}
@@ -106,6 +116,9 @@ in
};
};
# Inject Njalla API token for DNS-01 challenge
systemd.services.caddy.serviceConfig.EnvironmentFile = config.age.secrets.njalla-api-token-env.path;
systemd.tmpfiles.rules = [
"d ${config.services.caddy.dataDir} 700 ${config.services.caddy.user} ${config.services.caddy.group}"
];

View File

@@ -0,0 +1,7 @@
{
imports = [
./caddy.nix
# KEEP UNTIL 2028
./caddy_senior_project.nix
];
}

27
services/ddns-updater.nix Normal file
View File

@@ -0,0 +1,27 @@
{
config,
lib,
...
}:
{
services.ddns-updater = {
enable = true;
environment = {
PERIOD = "5m";
# ddns-updater reads config from this path at runtime
CONFIG_FILEPATH = config.age.secrets.ddns-updater-config.path;
};
};
users.users.ddns-updater = {
isSystemUser = true;
group = "ddns-updater";
};
users.groups.ddns-updater = { };
systemd.services.ddns-updater.serviceConfig = {
DynamicUser = lib.mkForce false;
User = "ddns-updater";
Group = "ddns-updater";
};
}

View File

@@ -6,6 +6,13 @@
...
}:
{
imports = [
(lib.mkCaddyReverseProxy {
domain = service_configs.firefox_syncserver.domain;
port = service_configs.ports.private.firefox_syncserver.port;
})
];
services.firefox-syncserver = {
enable = true;
database = {
@@ -33,7 +40,4 @@
];
};
services.caddy.virtualHosts."${service_configs.firefox_syncserver.domain}".extraConfig = ''
reverse_proxy :${builtins.toString service_configs.ports.private.firefox_syncserver.port}
'';
}

View File

@@ -0,0 +1,50 @@
{
config,
lib,
pkgs,
service_configs,
...
}:
{
services.gitea-actions-runner.instances.muffin = {
enable = true;
name = "muffin";
url = config.services.gitea.settings.server.ROOT_URL;
tokenFile = config.age.secrets.gitea-runner-token.path;
labels = [ "nix:host" ];
hostPackages = with pkgs; [
bash
coreutils
curl
gawk
git
git-crypt
gnugrep
gnused
jq
nix
nodejs
openssh
];
settings = {
runner = {
capacity = 1;
timeout = "6h";
};
};
};
# Override DynamicUser to use our static gitea-runner user, and ensure
# the runner doesn't start before the co-located gitea instance is ready
# (upstream can't assume locality, so this dependency is ours to add).
systemd.services."gitea-runner-muffin" = {
requires = [ "gitea.service" ];
after = [ "gitea.service" ];
serviceConfig = {
DynamicUser = lib.mkForce false;
User = "gitea-runner";
Group = "gitea-runner";
};
environment.GIT_SSH_COMMAND = "ssh -i /run/agenix/ci-deploy-key -o StrictHostKeyChecking=yes -o UserKnownHostsFile=/etc/ci-known-hosts";
};
}

View File

@@ -11,6 +11,14 @@
(lib.serviceFilePerms "gitea" [
"Z ${config.services.gitea.stateDir} 0700 ${config.services.gitea.user} ${config.services.gitea.group}"
])
(lib.mkCaddyReverseProxy {
domain = service_configs.gitea.domain;
port = service_configs.ports.private.gitea.port;
})
(lib.mkFail2banJail {
name = "gitea";
failregex = "^.*Failed authentication attempt for .* from <HOST>:.*$";
})
];
services.gitea = {
@@ -37,13 +45,10 @@
};
# only I shall use gitea
service.DISABLE_REGISTRATION = true;
actions.ENABLED = true;
};
};
services.caddy.virtualHosts."${service_configs.gitea.domain}".extraConfig = ''
reverse_proxy :${builtins.toString config.services.gitea.settings.server.HTTP_PORT}
'';
services.postgresql = {
ensureDatabases = [ config.services.gitea.user ];
ensureUsers = [
@@ -57,18 +62,4 @@
services.openssh.settings.AllowUsers = [ config.services.gitea.user ];
# Protect Gitea login from brute force attacks
services.fail2ban.jails.gitea = {
enabled = true;
settings = {
backend = "systemd";
port = "http,https";
# defaults: maxretry=5, findtime=10m, bantime=10m
};
filter.Definition = {
failregex = "^.*Failed authentication attempt for .* from <HOST>:.*$";
ignoreregex = "";
journalmatch = "_SYSTEMD_UNIT=gitea.service";
};
};
}

View File

@@ -0,0 +1,698 @@
{
...
}:
let
promDs = {
type = "prometheus";
uid = "prometheus";
};
dashboard = {
editable = true;
graphTooltip = 1;
schemaVersion = 39;
tags = [
"system"
"monitoring"
];
time = {
from = "now-6h";
to = "now";
};
timezone = "browser";
title = "System Overview";
uid = "system-overview";
annotations.list = [
{
name = "Jellyfin Streams";
datasource = {
type = "grafana";
uid = "-- Grafana --";
};
enable = true;
iconColor = "green";
showIn = 0;
type = "tags";
tags = [ "jellyfin" ];
}
{
name = "ZFS Scrubs";
datasource = {
type = "grafana";
uid = "-- Grafana --";
};
enable = true;
iconColor = "orange";
showIn = 0;
type = "tags";
tags = [ "zfs-scrub" ];
}
{
name = "LLM Requests";
datasource = promDs;
enable = true;
iconColor = "purple";
target = {
datasource = promDs;
expr = "llamacpp:requests_processing > 0";
instant = false;
range = true;
refId = "A";
};
titleFormat = "LLM inference";
}
];
panels = [
# -- Row 1: UPS --
{
id = 1;
type = "timeseries";
title = "UPS Power Draw";
gridPos = {
h = 8;
w = 8;
x = 0;
y = 0;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = "apcupsd_ups_load_percent / 100 * apcupsd_nominal_power_watts";
legendFormat = "Power (W)";
refId = "A";
}
{
datasource = promDs;
expr = "avg_over_time((apcupsd_ups_load_percent / 100 * apcupsd_nominal_power_watts + 4.5)[5m:])";
legendFormat = "5m average (W)";
refId = "B";
}
];
fieldConfig = {
defaults = {
unit = "watt";
color.mode = "palette-classic";
custom = {
lineWidth = 2;
fillOpacity = 20;
spanNulls = true;
};
};
overrides = [
{
matcher = {
id = "byFrameRefID";
options = "A";
};
properties = [
{
id = "custom.lineStyle";
value = {
fill = "dot";
};
}
{
id = "custom.fillOpacity";
value = 10;
}
{
id = "custom.lineWidth";
value = 1;
}
{
id = "custom.pointSize";
value = 1;
}
];
}
{
matcher = {
id = "byFrameRefID";
options = "B";
};
properties = [
{
id = "custom.lineWidth";
value = 4;
}
{
id = "custom.fillOpacity";
value = 0;
}
];
}
];
};
}
{
id = 7;
type = "stat";
title = "Energy Usage (24h)";
gridPos = {
h = 8;
w = 4;
x = 8;
y = 0;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = "avg_over_time((apcupsd_ups_load_percent / 100 * apcupsd_nominal_power_watts + 4.5)[24h:]) * 24 / 1000";
legendFormat = "";
refId = "A";
}
];
fieldConfig = {
defaults = {
unit = "kwatth";
decimals = 2;
thresholds = {
mode = "absolute";
steps = [
{
color = "green";
value = null;
}
{
color = "yellow";
value = 5;
}
{
color = "red";
value = 10;
}
];
};
};
overrides = [ ];
};
options = {
reduceOptions = {
calcs = [ "lastNotNull" ];
fields = "";
values = false;
};
colorMode = "value";
graphMode = "none";
};
}
{
id = 2;
type = "gauge";
title = "UPS Load";
gridPos = {
h = 8;
w = 6;
x = 12;
y = 0;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = "apcupsd_ups_load_percent";
refId = "A";
}
];
fieldConfig = {
defaults = {
unit = "percent";
min = 0;
max = 100;
thresholds = {
mode = "absolute";
steps = [
{
color = "green";
value = null;
}
{
color = "yellow";
value = 70;
}
{
color = "red";
value = 90;
}
];
};
};
overrides = [ ];
};
options.reduceOptions = {
calcs = [ "lastNotNull" ];
fields = "";
values = false;
};
}
{
id = 3;
type = "gauge";
title = "UPS Battery";
gridPos = {
h = 8;
w = 6;
x = 18;
y = 0;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = "apcupsd_battery_charge_percent";
refId = "A";
}
];
fieldConfig = {
defaults = {
unit = "percent";
min = 0;
max = 100;
thresholds = {
mode = "absolute";
steps = [
{
color = "red";
value = null;
}
{
color = "yellow";
value = 20;
}
{
color = "green";
value = 50;
}
];
};
};
overrides = [ ];
};
options.reduceOptions = {
calcs = [ "lastNotNull" ];
fields = "";
values = false;
};
}
# -- Row 2: System --
{
id = 4;
type = "timeseries";
title = "CPU Temperature";
gridPos = {
h = 8;
w = 12;
x = 0;
y = 8;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = ''node_hwmon_temp_celsius{chip=~"pci.*"}'';
legendFormat = "CPU {{sensor}}";
refId = "A";
}
];
fieldConfig = {
defaults = {
unit = "celsius";
color.mode = "palette-classic";
custom = {
lineWidth = 2;
fillOpacity = 10;
spanNulls = true;
};
};
overrides = [ ];
};
}
{
id = 5;
type = "stat";
title = "System Uptime";
gridPos = {
h = 8;
w = 6;
x = 12;
y = 8;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = "time() - node_boot_time_seconds";
refId = "A";
}
];
fieldConfig = {
defaults = {
unit = "s";
thresholds = {
mode = "absolute";
steps = [
{
color = "green";
value = null;
}
];
};
};
overrides = [ ];
};
options = {
reduceOptions = {
calcs = [ "lastNotNull" ];
fields = "";
values = false;
};
colorMode = "value";
graphMode = "none";
};
}
{
id = 6;
type = "stat";
title = "Jellyfin Active Streams";
gridPos = {
h = 8;
w = 6;
x = 18;
y = 8;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = "count(jellyfin_now_playing_state) or vector(0)";
refId = "A";
}
];
fieldConfig = {
defaults = {
thresholds = {
mode = "absolute";
steps = [
{
color = "green";
value = null;
}
{
color = "yellow";
value = 3;
}
{
color = "red";
value = 6;
}
];
};
};
overrides = [ ];
};
options = {
reduceOptions = {
calcs = [ "lastNotNull" ];
fields = "";
values = false;
};
colorMode = "value";
graphMode = "area";
};
}
# -- Row 3: qBittorrent --
{
id = 11;
type = "timeseries";
title = "qBittorrent Speed";
gridPos = {
h = 8;
w = 24;
x = 0;
y = 16;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = "sum(qbit_dlspeed) or vector(0)";
legendFormat = "Download";
refId = "A";
}
{
datasource = promDs;
expr = "sum(qbit_upspeed) or vector(0)";
legendFormat = "Upload";
refId = "B";
}
{
datasource = promDs;
expr = "avg_over_time((sum(qbit_dlspeed) or vector(0))[10m:])";
legendFormat = "Download (10m avg)";
refId = "C";
}
{
datasource = promDs;
expr = "avg_over_time((sum(qbit_upspeed) or vector(0))[10m:])";
legendFormat = "Upload (10m avg)";
refId = "D";
}
];
fieldConfig = {
defaults = {
unit = "binBps";
min = 0;
color.mode = "palette-classic";
custom = {
lineWidth = 1;
fillOpacity = 10;
spanNulls = true;
};
};
overrides = [
{
matcher = {
id = "byFrameRefID";
options = "A";
};
properties = [
{
id = "color";
value = {
fixedColor = "green";
mode = "fixed";
};
}
{
id = "custom.fillOpacity";
value = 5;
}
];
}
{
matcher = {
id = "byFrameRefID";
options = "B";
};
properties = [
{
id = "color";
value = {
fixedColor = "blue";
mode = "fixed";
};
}
{
id = "custom.fillOpacity";
value = 5;
}
];
}
{
matcher = {
id = "byFrameRefID";
options = "C";
};
properties = [
{
id = "color";
value = {
fixedColor = "green";
mode = "fixed";
};
}
{
id = "custom.lineWidth";
value = 3;
}
{
id = "custom.fillOpacity";
value = 0;
}
];
}
{
matcher = {
id = "byFrameRefID";
options = "D";
};
properties = [
{
id = "color";
value = {
fixedColor = "blue";
mode = "fixed";
};
}
{
id = "custom.lineWidth";
value = 3;
}
{
id = "custom.fillOpacity";
value = 0;
}
];
}
];
};
}
# -- Row 4: Intel GPU --
{
id = 8;
type = "timeseries";
title = "Intel GPU Utilization";
gridPos = {
h = 8;
w = 24;
x = 0;
y = 24;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = "igpu_engines_busy_percent";
legendFormat = "{{engine}}";
refId = "A";
}
];
fieldConfig = {
defaults = {
unit = "percent";
min = 0;
max = 100;
color.mode = "palette-classic";
custom = {
lineWidth = 2;
fillOpacity = 10;
spanNulls = true;
};
};
overrides = [ ];
};
}
# -- Row 5: Storage --
{
id = 12;
type = "timeseries";
title = "ZFS Pool Utilization";
gridPos = {
h = 8;
w = 12;
x = 0;
y = 32;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = "zfs_pool_allocated_bytes{pool=\"tank\"} / zfs_pool_size_bytes{pool=\"tank\"} * 100";
legendFormat = "tank";
refId = "A";
}
{
datasource = promDs;
expr = "zfs_pool_allocated_bytes{pool=\"hdds\"} / zfs_pool_size_bytes{pool=\"hdds\"} * 100";
legendFormat = "hdds";
refId = "B";
}
];
fieldConfig = {
defaults = {
unit = "percent";
min = 0;
max = 100;
color.mode = "palette-classic";
custom = {
lineWidth = 2;
fillOpacity = 20;
spanNulls = true;
};
};
overrides = [ ];
};
}
{
id = 13;
type = "timeseries";
title = "Boot Drive Partitions";
gridPos = {
h = 8;
w = 12;
x = 12;
y = 32;
};
datasource = promDs;
targets = [
{
datasource = promDs;
expr = "(node_filesystem_size_bytes{mountpoint=\"/boot\"} - node_filesystem_avail_bytes{mountpoint=\"/boot\"}) / node_filesystem_size_bytes{mountpoint=\"/boot\"} * 100";
legendFormat = "/boot";
refId = "A";
}
{
datasource = promDs;
expr = "(node_filesystem_size_bytes{mountpoint=\"/persistent\"} - node_filesystem_avail_bytes{mountpoint=\"/persistent\"}) / node_filesystem_size_bytes{mountpoint=\"/persistent\"} * 100";
legendFormat = "/persistent";
refId = "B";
}
{
datasource = promDs;
expr = "(node_filesystem_size_bytes{mountpoint=\"/nix\"} - node_filesystem_avail_bytes{mountpoint=\"/nix\"}) / node_filesystem_size_bytes{mountpoint=\"/nix\"} * 100";
legendFormat = "/nix";
refId = "C";
}
];
fieldConfig = {
defaults = {
unit = "percent";
min = 0;
max = 100;
color.mode = "palette-classic";
custom = {
lineWidth = 2;
fillOpacity = 20;
spanNulls = true;
};
};
overrides = [ ];
};
}
];
};
in
{
environment.etc."grafana-dashboards/system-overview.json" = {
text = builtins.toJSON dashboard;
mode = "0444";
};
}

View File

@@ -0,0 +1,10 @@
{
imports = [
./grafana.nix
./prometheus.nix
./dashboard.nix
./exporters.nix
./jellyfin-annotations.nix
./zfs-scrub-annotations.nix
];
}

View File

@@ -0,0 +1,112 @@
{
config,
pkgs,
inputs,
service_configs,
lib,
...
}:
let
jellyfinExporterPort = service_configs.ports.private.jellyfin_exporter.port;
qbitExporterPort = service_configs.ports.private.qbittorrent_exporter.port;
igpuExporterPort = service_configs.ports.private.igpu_exporter.port;
in
{
# -- Jellyfin Prometheus Exporter --
# Replaces custom jellyfin-collector.nix textfile timer.
# Exposes per-session metrics (jellyfin_now_playing_state) and library stats.
systemd.services.jellyfin-exporter =
lib.mkIf (config.services.grafana.enable && config.services.jellyfin.enable)
{
description = "Prometheus exporter for Jellyfin";
after = [
"network.target"
"jellyfin.service"
];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
ExecStart = lib.getExe (
pkgs.writeShellApplication {
name = "jellyfin-exporter-wrapper";
runtimeInputs = [ pkgs.jellyfin-exporter ];
text = ''
exec jellyfin_exporter \
--jellyfin.address=http://127.0.0.1:${toString service_configs.ports.private.jellyfin.port} \
--jellyfin.token="$(cat "$CREDENTIALS_DIRECTORY/jellyfin-api-key")" \
--web.listen-address=127.0.0.1:${toString jellyfinExporterPort}
'';
}
);
Restart = "on-failure";
RestartSec = "10s";
DynamicUser = true;
NoNewPrivileges = true;
ProtectSystem = "strict";
ProtectHome = true;
PrivateTmp = true;
MemoryDenyWriteExecute = true;
LoadCredential = "jellyfin-api-key:${config.age.secrets.jellyfin-api-key.path}";
};
};
# -- qBittorrent Prometheus Exporter --
# Replaces custom qbittorrent-collector.nix textfile timer.
# Exposes per-torrent metrics (qbit_dlspeed, qbit_upspeed) and aggregate stats.
# qBittorrent runs in a VPN namespace; the exporter reaches it via namespace address.
systemd.services.qbittorrent-exporter =
lib.mkIf (config.services.grafana.enable && config.services.qbittorrent.enable)
{
description = "Prometheus exporter for qBittorrent";
after = [
"network.target"
"qbittorrent.service"
];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
ExecStart =
lib.getExe' inputs.qbittorrent-metrics-exporter.packages.${pkgs.system}.default
"qbittorrent-metrics-exporter";
Restart = "on-failure";
RestartSec = "10s";
DynamicUser = true;
NoNewPrivileges = true;
ProtectSystem = "strict";
ProtectHome = true;
PrivateTmp = true;
};
environment = {
HOST = "127.0.0.1";
PORT = toString qbitExporterPort;
SCRAPE_INTERVAL = "15";
BACKEND = "in-memory";
# qBittorrent has AuthSubnetWhitelist=0.0.0.0/0, so no real password needed.
# The exporter still expects the env var to be set.
QBITTORRENT_PASSWORD = "unused";
QBITTORRENT_USERNAME = "admin";
TORRENT_HOSTS = "qbit:main=http://${config.vpnNamespaces.wg.namespaceAddress}:${toString config.services.qbittorrent.webuiPort}|http://${config.vpnNamespaces.wg.namespaceAddress}:${toString config.services.qbittorrent.webuiPort}";
RUST_LOG = "warn";
};
};
# -- Intel GPU Prometheus Exporter --
# Replaces custom intel-gpu-collector.nix + intel-gpu-collector.py textfile timer.
# Exposes engine busy%, frequency, and RC6 metrics via /metrics.
# Requires privileged access to GPU debug interfaces (intel_gpu_top).
systemd.services.igpu-exporter = lib.mkIf config.services.grafana.enable {
description = "Prometheus exporter for Intel integrated GPU";
wantedBy = [ "multi-user.target" ];
path = [ pkgs.intel-gpu-tools ];
serviceConfig = {
ExecStart = lib.getExe pkgs.igpu-exporter;
Restart = "on-failure";
RestartSec = "10s";
# intel_gpu_top requires root-level access to GPU debug interfaces
ProtectHome = true;
PrivateTmp = true;
};
environment = {
PORT = toString igpuExporterPort;
REFRESH_PERIOD_MS = "30000";
};
};
}

View File

@@ -0,0 +1,103 @@
{
config,
service_configs,
lib,
...
}:
{
imports = [
(lib.serviceMountWithZpool "grafana" service_configs.zpool_ssds [
service_configs.grafana.dir
])
(lib.serviceFilePerms "grafana" [
"Z ${service_configs.grafana.dir} 0700 grafana grafana"
])
(lib.mkCaddyReverseProxy {
domain = service_configs.grafana.domain;
port = service_configs.ports.private.grafana.port;
auth = true;
})
];
services.grafana = {
enable = true;
dataDir = service_configs.grafana.dir;
settings = {
server = {
http_addr = "127.0.0.1";
http_port = service_configs.ports.private.grafana.port;
domain = service_configs.grafana.domain;
root_url = "https://${service_configs.grafana.domain}";
};
database = {
type = "postgres";
host = service_configs.postgres.socket;
user = "grafana";
};
"auth.anonymous" = {
enabled = true;
org_role = "Admin";
};
"auth.basic".enabled = false;
"auth".disable_login_form = true;
analytics.reporting_enabled = false;
feature_toggles.enable = "dataConnectionsConsole=false";
users.default_theme = "dark";
# Disable unused built-in integrations
alerting.enabled = false;
"unified_alerting".enabled = false;
explore.enabled = false;
news.news_feed_enabled = false;
plugins = {
enable_alpha = false;
plugin_admin_enabled = false;
};
};
provision = {
datasources.settings = {
apiVersion = 1;
datasources = [
{
name = "Prometheus";
type = "prometheus";
url = "http://127.0.0.1:${toString service_configs.ports.private.prometheus.port}";
access = "proxy";
isDefault = true;
editable = false;
uid = "prometheus";
}
];
};
dashboards.settings.providers = [
{
name = "system";
type = "file";
options.path = "/etc/grafana-dashboards";
disableDeletion = true;
updateIntervalSeconds = 60;
}
];
};
};
services.postgresql = {
ensureDatabases = [ "grafana" ];
ensureUsers = [
{
name = "grafana";
ensureDBOwnership = true;
ensureClauses.login = true;
}
];
};
}

View File

@@ -0,0 +1,18 @@
{
config,
service_configs,
lib,
...
}:
lib.mkIf (config.services.grafana.enable && config.services.jellyfin.enable) (
lib.mkGrafanaAnnotationService {
name = "jellyfin";
description = "Jellyfin stream annotation service for Grafana";
script = ./jellyfin-annotations.py;
environment = {
JELLYFIN_URL = "http://127.0.0.1:${toString service_configs.ports.private.jellyfin.port}";
POLL_INTERVAL = "30";
};
loadCredential = "jellyfin-api-key:${config.age.secrets.jellyfin-api-key.path}";
}
)

View File

@@ -0,0 +1,233 @@
#!/usr/bin/env python3
import json
import os
import sys
import time
import urllib.request
from pathlib import Path
JELLYFIN_URL = os.environ.get("JELLYFIN_URL", "http://127.0.0.1:8096")
GRAFANA_URL = os.environ.get("GRAFANA_URL", "http://127.0.0.1:3000")
STATE_FILE = os.environ.get("STATE_FILE", "/var/lib/jellyfin-annotations/state.json")
POLL_INTERVAL = int(os.environ.get("POLL_INTERVAL", "30"))
def get_api_key():
cred_dir = os.environ.get("CREDENTIALS_DIRECTORY")
if cred_dir:
return Path(cred_dir, "jellyfin-api-key").read_text().strip()
for p in ["/run/agenix/jellyfin-api-key"]:
if Path(p).exists():
return Path(p).read_text().strip()
sys.exit("ERROR: Cannot find jellyfin-api-key")
def http_json(method, url, body=None):
data = json.dumps(body).encode() if body is not None else None
req = urllib.request.Request(
url,
data=data,
headers={"Content-Type": "application/json", "Accept": "application/json"},
method=method,
)
with urllib.request.urlopen(req, timeout=5) as resp:
return json.loads(resp.read())
def get_active_sessions(api_key):
try:
req = urllib.request.Request(
f"{JELLYFIN_URL}/Sessions?api_key={api_key}",
headers={"Accept": "application/json"},
)
with urllib.request.urlopen(req, timeout=5) as resp:
sessions = json.loads(resp.read())
return [s for s in sessions if s.get("NowPlayingItem")]
except Exception as e:
print(f"Error fetching sessions: {e}", file=sys.stderr)
return None
def _codec(name):
if not name:
return ""
aliases = {"h264": "H.264", "h265": "H.265", "hevc": "H.265", "av1": "AV1",
"vp9": "VP9", "vp8": "VP8", "mpeg4": "MPEG-4", "mpeg2video": "MPEG-2",
"aac": "AAC", "ac3": "AC3", "eac3": "EAC3", "dts": "DTS",
"truehd": "TrueHD", "mp3": "MP3", "opus": "Opus", "flac": "FLAC",
"vorbis": "Vorbis"}
return aliases.get(name.lower(), name.upper())
def _res(width, height):
if not height:
return ""
common = {2160: "4K", 1440: "1440p", 1080: "1080p", 720: "720p",
480: "480p", 360: "360p"}
return common.get(height, f"{height}p")
def _channels(n):
labels = {1: "Mono", 2: "Stereo", 6: "5.1", 7: "6.1", 8: "7.1"}
return labels.get(n, f"{n}ch") if n else ""
def format_label(session):
user = session.get("UserName", "Unknown")
item = session.get("NowPlayingItem", {}) or {}
transcode = session.get("TranscodingInfo") or {}
play_state = session.get("PlayState") or {}
client = session.get("Client", "")
device = session.get("DeviceName", "")
name = item.get("Name", "Unknown")
series = item.get("SeriesName", "")
season = item.get("ParentIndexNumber")
episode = item.get("IndexNumber")
media_type = item.get("Type", "")
if series and season and episode:
title = f"{series} S{season:02d}E{episode:02d} \u2013 {name}"
elif series:
title = f"{series} \u2013 {name}"
elif media_type == "Movie":
title = f"{name} (movie)"
else:
title = name
play_method = play_state.get("PlayMethod", "")
if play_method == "DirectPlay":
method = "Direct Play"
elif play_method == "DirectStream":
method = "Direct Stream"
elif play_method == "Transcode" or transcode:
method = "Transcode"
else:
method = "Direct Play"
media_streams = item.get("MediaStreams") or []
video_streams = [s for s in media_streams if s.get("Type") == "Video"]
audio_streams = [s for s in media_streams if s.get("Type") == "Audio"]
default_audio = next((s for s in audio_streams if s.get("IsDefault")), None)
audio_stream = default_audio or (audio_streams[0] if audio_streams else {})
video_stream = video_streams[0] if video_streams else {}
src_vcodec = _codec(video_stream.get("Codec", ""))
src_res = _res(video_stream.get("Width") or item.get("Width"),
video_stream.get("Height") or item.get("Height"))
src_acodec = _codec(audio_stream.get("Codec", ""))
src_channels = _channels(audio_stream.get("Channels"))
is_video_direct = transcode.get("IsVideoDirect", True)
is_audio_direct = transcode.get("IsAudioDirect", True)
if transcode and not is_video_direct:
dst_vcodec = _codec(transcode.get("VideoCodec", ""))
dst_res = _res(transcode.get("Width"), transcode.get("Height")) or src_res
if src_vcodec and dst_vcodec and src_vcodec != dst_vcodec:
video_part = f"{src_vcodec}\u2192{dst_vcodec} {dst_res}".strip()
else:
video_part = f"{dst_vcodec or src_vcodec} {dst_res}".strip()
else:
video_part = f"{src_vcodec} {src_res}".strip()
if transcode and not is_audio_direct:
dst_acodec = _codec(transcode.get("AudioCodec", ""))
dst_channels = _channels(transcode.get("AudioChannels")) or src_channels
if src_acodec and dst_acodec and src_acodec != dst_acodec:
audio_part = f"{src_acodec}\u2192{dst_acodec} {dst_channels}".strip()
else:
audio_part = f"{dst_acodec or src_acodec} {dst_channels}".strip()
else:
audio_part = f"{src_acodec} {src_channels}".strip()
bitrate = transcode.get("Bitrate") or item.get("Bitrate")
bitrate_part = f"{bitrate / 1_000_000:.1f} Mbps" if bitrate else ""
reasons = transcode.get("TranscodeReasons") or []
reason_part = f"[{', '.join(reasons)}]" if reasons else ""
stream_parts = [p for p in [method, video_part, audio_part, bitrate_part, reason_part] if p]
client_str = " \u00b7 ".join(filter(None, [client, device]))
lines = [f"{user}: {title}", " | ".join(stream_parts)]
if client_str:
lines.append(client_str)
return "\n".join(lines)
def load_state():
try:
with open(STATE_FILE) as f:
return json.load(f)
except (FileNotFoundError, json.JSONDecodeError):
return {}
def save_state(state):
os.makedirs(os.path.dirname(STATE_FILE), exist_ok=True)
tmp = STATE_FILE + ".tmp"
with open(tmp, "w") as f:
json.dump(state, f)
os.replace(tmp, STATE_FILE)
def grafana_post(label, start_ms):
try:
result = http_json(
"POST",
f"{GRAFANA_URL}/api/annotations",
{"time": start_ms, "text": label, "tags": ["jellyfin"]},
)
return result.get("id")
except Exception as e:
print(f"Error posting annotation: {e}", file=sys.stderr)
return None
def grafana_close(grafana_id, end_ms):
try:
http_json(
"PATCH",
f"{GRAFANA_URL}/api/annotations/{grafana_id}",
{"timeEnd": end_ms},
)
except Exception as e:
print(f"Error closing annotation {grafana_id}: {e}", file=sys.stderr)
def main():
api_key = get_api_key()
state = load_state()
while True:
now_ms = int(time.time() * 1000)
sessions = get_active_sessions(api_key)
if sessions is not None:
current_ids = {s["Id"] for s in sessions}
for s in sessions:
sid = s["Id"]
if sid not in state:
label = format_label(s)
grafana_id = grafana_post(label, now_ms)
if grafana_id is not None:
state[sid] = {
"grafana_id": grafana_id,
"label": label,
"start_ms": now_ms,
}
save_state(state)
for sid in [k for k in state if k not in current_ids]:
info = state.pop(sid)
grafana_close(info["grafana_id"], now_ms)
save_state(state)
time.sleep(POLL_INTERVAL)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,110 @@
{
service_configs,
lib,
...
}:
let
textfileDir = "/var/lib/prometheus-node-exporter-textfiles";
in
{
imports = [
(lib.serviceMountWithZpool "prometheus" service_configs.zpool_ssds [
"/var/lib/prometheus"
])
(lib.serviceFilePerms "prometheus" [
"Z /var/lib/prometheus 0700 prometheus prometheus"
])
];
services.prometheus = {
enable = true;
port = service_configs.ports.private.prometheus.port;
listenAddress = "127.0.0.1";
stateDir = "prometheus";
retentionTime = "0d"; # 0 disables time-based retention (keep forever)
exporters = {
node = {
enable = true;
port = service_configs.ports.private.prometheus_node.port;
listenAddress = "127.0.0.1";
enabledCollectors = [
"hwmon"
"systemd"
"textfile"
];
extraFlags = [
"--collector.textfile.directory=${textfileDir}"
];
};
apcupsd = {
enable = true;
port = service_configs.ports.private.prometheus_apcupsd.port;
listenAddress = "127.0.0.1";
apcupsdAddress = "127.0.0.1:3551";
};
zfs = {
enable = true;
port = service_configs.ports.private.prometheus_zfs.port;
listenAddress = "127.0.0.1";
};
};
scrapeConfigs = [
{
job_name = "prometheus";
static_configs = [
{ targets = [ "127.0.0.1:${toString service_configs.ports.private.prometheus.port}" ]; }
];
}
{
job_name = "node";
static_configs = [
{ targets = [ "127.0.0.1:${toString service_configs.ports.private.prometheus_node.port}" ]; }
];
}
{
job_name = "apcupsd";
static_configs = [
{ targets = [ "127.0.0.1:${toString service_configs.ports.private.prometheus_apcupsd.port}" ]; }
];
}
{
job_name = "llama-cpp";
static_configs = [
{ targets = [ "127.0.0.1:${toString service_configs.ports.private.llama_cpp.port}" ]; }
];
}
{
job_name = "jellyfin";
static_configs = [
{ targets = [ "127.0.0.1:${toString service_configs.ports.private.jellyfin_exporter.port}" ]; }
];
}
{
job_name = "qbittorrent";
static_configs = [
{ targets = [ "127.0.0.1:${toString service_configs.ports.private.qbittorrent_exporter.port}" ]; }
];
}
{
job_name = "igpu";
static_configs = [
{ targets = [ "127.0.0.1:${toString service_configs.ports.private.igpu_exporter.port}" ]; }
];
}
{
job_name = "zfs";
static_configs = [
{ targets = [ "127.0.0.1:${toString service_configs.ports.private.prometheus_zfs.port}" ]; }
];
}
];
};
systemd.tmpfiles.rules = [
"d ${textfileDir} 0755 root root -"
];
}

View File

@@ -0,0 +1,36 @@
{
config,
pkgs,
service_configs,
lib,
...
}:
let
grafanaUrl = "http://127.0.0.1:${toString service_configs.ports.private.grafana.port}";
script = pkgs.writeShellApplication {
name = "zfs-scrub-annotations";
runtimeInputs = with pkgs; [
curl
jq
coreutils
gnugrep
gnused
config.boot.zfs.package
];
text = builtins.readFile ./zfs-scrub-annotations.sh;
};
in
lib.mkIf (config.services.grafana.enable && config.services.zfs.autoScrub.enable) {
systemd.services.zfs-scrub = {
environment = {
GRAFANA_URL = grafanaUrl;
STATE_DIR = "/run/zfs-scrub-annotations";
};
serviceConfig = {
RuntimeDirectory = "zfs-scrub-annotations";
ExecStartPre = [ "-${lib.getExe script} start" ];
ExecStopPost = [ "${lib.getExe script} stop" ];
};
};
}

View File

@@ -0,0 +1,55 @@
#!/usr/bin/env bash
# ZFS scrub annotation script for Grafana
# Usage: zfs-scrub-annotations.sh {start|stop}
# Required env: GRAFANA_URL, STATE_DIR
# Required on PATH: zpool, curl, jq, paste, date, grep, sed
set -euo pipefail
ACTION="${1:-}"
GRAFANA_URL="${GRAFANA_URL:?GRAFANA_URL required}"
STATE_DIR="${STATE_DIR:?STATE_DIR required}"
case "$ACTION" in
start)
POOLS=$(zpool list -H -o name | paste -sd ', ')
NOW_MS=$(date +%s%3N)
RESPONSE=$(curl -sf --max-time 5 \
-X POST "$GRAFANA_URL/api/annotations" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg text "ZFS scrub: $POOLS" --argjson time "$NOW_MS" \
'{time: $time, text: $text, tags: ["zfs-scrub"]}')" \
) || exit 0
echo "$RESPONSE" | jq -r '.id' > "$STATE_DIR/annotation-id"
;;
stop)
ANN_ID=$(cat "$STATE_DIR/annotation-id" 2>/dev/null) || exit 0
[ -z "$ANN_ID" ] && exit 0
NOW_MS=$(date +%s%3N)
RESULTS=""
while IFS= read -r pool; do
scan_line=$(zpool status "$pool" | grep "scan:" | sed 's/^[[:space:]]*//')
RESULTS="${RESULTS}${pool}: ${scan_line}"$'\n'
done < <(zpool list -H -o name)
TEXT=$(printf "ZFS scrub completed\n%s" "$RESULTS")
curl -sf --max-time 5 \
-X PATCH "$GRAFANA_URL/api/annotations/$ANN_ID" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg text "$TEXT" --argjson timeEnd "$NOW_MS" \
'{timeEnd: $timeEnd, text: $text}')" || true
rm -f "$STATE_DIR/annotation-id"
;;
*)
echo "Usage: $0 {start|stop}" >&2
exit 1
;;
esac

38
services/harmonia.nix Normal file
View File

@@ -0,0 +1,38 @@
{
config,
lib,
service_configs,
...
}:
{
imports = [
(lib.serviceFilePerms "harmonia" [
"Z /run/agenix/harmonia-sign-key 0400 harmonia harmonia"
])
];
services.harmonia = {
enable = true;
signKeyPaths = [ config.age.secrets.harmonia-sign-key.path ];
settings.bind = "127.0.0.1:${toString service_configs.ports.private.harmonia.port}";
};
# serve latest deploy store paths (unauthenticated — just a path string)
# CI writes to /var/lib/dotfiles-deploy/<hostname> after building
services.caddy.virtualHosts."nix-cache.${service_configs.https.domain}".extraConfig = ''
handle_path /deploy/* {
root * /var/lib/dotfiles-deploy
file_server
}
handle {
import ${config.age.secrets.nix-cache-auth.path}
reverse_proxy :${toString service_configs.ports.private.harmonia.port}
}
'';
# directory for CI to record latest deploy store paths
systemd.tmpfiles.rules = [
"d /var/lib/dotfiles-deploy 0755 gitea-runner gitea-runner"
];
}

View File

@@ -16,6 +16,15 @@
(lib.serviceFilePerms "immich-server" [
"Z ${config.services.immich.mediaLocation} 0770 ${config.services.immich.user} ${config.services.immich.group}"
])
(lib.mkCaddyReverseProxy {
subdomain = "immich";
port = service_configs.ports.private.immich.port;
})
(lib.mkFail2banJail {
name = "immich";
unitName = "immich-server.service";
failregex = "^.*Failed login attempt for user .* from ip address <HOST>.*$";
})
];
services.immich = {
@@ -29,10 +38,6 @@
};
};
services.caddy.virtualHosts."immich.${service_configs.https.domain}".extraConfig = ''
reverse_proxy :${builtins.toString config.services.immich.port}
'';
environment.systemPackages = with pkgs; [
immich-go
];
@@ -42,18 +47,4 @@
"render"
];
# Protect Immich login from brute force attacks
services.fail2ban.jails.immich = {
enabled = true;
settings = {
backend = "systemd";
port = "http,https";
# defaults: maxretry=5, findtime=10m, bantime=10m
};
filter.Definition = {
failregex = "^.*Failed login attempt for user .* from ip address <HOST>.*$";
ignoreregex = "";
journalmatch = "_SYSTEMD_UNIT=immich-server.service";
};
};
}

View File

@@ -1,57 +0,0 @@
{
pkgs,
service_configs,
config,
...
}:
{
systemd.services."jellyfin-qbittorrent-monitor" = {
description = "Monitor Jellyfin streaming and control qBittorrent rate limits";
after = [
"network.target"
"jellyfin.service"
"qbittorrent.service"
];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "simple";
ExecStart = pkgs.writeShellScript "jellyfin-monitor-start" ''
export JELLYFIN_API_KEY=$(cat $CREDENTIALS_DIRECTORY/jellyfin-api-key)
exec ${
pkgs.python3.withPackages (ps: with ps; [ requests ])
}/bin/python ${./jellyfin-qbittorrent-monitor.py}
'';
Restart = "always";
RestartSec = "10s";
# Security hardening
DynamicUser = true;
NoNewPrivileges = true;
ProtectSystem = "strict";
ProtectHome = true;
ProtectKernelTunables = true;
ProtectKernelModules = true;
ProtectControlGroups = true;
MemoryDenyWriteExecute = true;
RestrictRealtime = true;
RestrictSUIDSGID = true;
RemoveIPC = true;
# Load credentials from agenix secrets
LoadCredential = "jellyfin-api-key:${config.age.secrets.jellyfin-api-key.path}";
};
environment = {
JELLYFIN_URL = "http://localhost:${builtins.toString service_configs.ports.private.jellyfin.port}";
QBITTORRENT_URL = "http://${config.vpnNamespaces.wg.namespaceAddress}:${builtins.toString service_configs.ports.private.torrent.port}";
CHECK_INTERVAL = "30";
# Bandwidth budget configuration
TOTAL_BANDWIDTH_BUDGET = "30000000"; # 30 Mbps in bits per second
SERVICE_BUFFER = "5000000"; # 5 Mbps reserved for other services (bps)
DEFAULT_STREAM_BITRATE = "10000000"; # 10 Mbps fallback when bitrate unknown (bps)
MIN_TORRENT_SPEED = "100"; # KB/s - below this, pause torrents instead
STREAM_BITRATE_HEADROOM = "1.1"; # multiplier per stream for bitrate fluctuations
};
};
}

View File

@@ -0,0 +1,6 @@
{
imports = [
./jellyfin.nix
./jellyfin-qbittorrent-monitor.nix
];
}

View File

@@ -0,0 +1,127 @@
{
pkgs,
service_configs,
config,
lib,
...
}:
let
webhookPlugin = import ./jellyfin-webhook-plugin.nix { inherit pkgs lib; };
jellyfinPort = service_configs.ports.private.jellyfin.port;
webhookPort = service_configs.ports.private.jellyfin_qbittorrent_monitor_webhook.port;
in
lib.mkIf config.services.jellyfin.enable {
# Materialise the Jellyfin Webhook plugin into Jellyfin's plugins dir before
# Jellyfin starts. Jellyfin rewrites meta.json at runtime, so a read-only
# nix-store symlink would EACCES -- we copy instead.
#
# `wantedBy = [ "jellyfin.service" ]` alone is insufficient on initial rollout:
# if jellyfin is already running at activation time, systemd won't start the
# oneshot until the next jellyfin restart. `restartTriggers` on jellyfin pinned
# to the plugin package + install script forces that restart whenever either
# changes, which invokes this unit via the `before`/`wantedBy` chain.
systemd.services.jellyfin-webhook-install = {
before = [ "jellyfin.service" ];
wantedBy = [ "jellyfin.service" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
User = config.services.jellyfin.user;
Group = config.services.jellyfin.group;
ExecStart = webhookPlugin.mkInstallScript {
pluginsDir = "${config.services.jellyfin.dataDir}/plugins";
};
};
};
systemd.services.jellyfin.restartTriggers = [
webhookPlugin.package
(webhookPlugin.mkInstallScript {
pluginsDir = "${config.services.jellyfin.dataDir}/plugins";
})
];
# After Jellyfin starts, POST the plugin configuration so the webhook
# targets the monitor's receiver. Idempotent; runs on every boot.
systemd.services.jellyfin-webhook-configure = {
after = [ "jellyfin.service" ];
wants = [ "jellyfin.service" ];
before = [ "jellyfin-qbittorrent-monitor.service" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
DynamicUser = true;
LoadCredential = "jellyfin-api-key:${config.age.secrets.jellyfin-api-key.path}";
ExecStart = webhookPlugin.mkConfigureScript {
jellyfinUrl = "http://127.0.0.1:${toString jellyfinPort}";
webhooks = [
{
name = "qBittorrent Monitor";
uri = "http://127.0.0.1:${toString webhookPort}/";
notificationTypes = [
"PlaybackStart"
"PlaybackProgress"
"PlaybackStop"
];
}
];
};
};
};
systemd.services."jellyfin-qbittorrent-monitor" = {
description = "Monitor Jellyfin streaming and control qBittorrent rate limits";
after = [
"network.target"
"jellyfin.service"
"qbittorrent.service"
"jellyfin-webhook-configure.service"
];
wants = [ "jellyfin-webhook-configure.service" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "simple";
ExecStart = pkgs.writeShellScript "jellyfin-monitor-start" ''
export JELLYFIN_API_KEY=$(cat $CREDENTIALS_DIRECTORY/jellyfin-api-key)
exec ${
pkgs.python3.withPackages (ps: with ps; [ requests ])
}/bin/python ${./jellyfin-qbittorrent-monitor.py}
'';
Restart = "always";
RestartSec = "10s";
# Security hardening
DynamicUser = true;
NoNewPrivileges = true;
ProtectSystem = "strict";
ProtectHome = true;
ProtectKernelTunables = true;
ProtectKernelModules = true;
ProtectControlGroups = true;
MemoryDenyWriteExecute = true;
RestrictRealtime = true;
RestrictSUIDSGID = true;
RemoveIPC = true;
# Load credentials from agenix secrets
LoadCredential = "jellyfin-api-key:${config.age.secrets.jellyfin-api-key.path}";
};
environment = {
JELLYFIN_URL = "http://localhost:${builtins.toString jellyfinPort}";
QBITTORRENT_URL = "http://${config.vpnNamespaces.wg.namespaceAddress}:${builtins.toString service_configs.ports.private.torrent.port}";
CHECK_INTERVAL = "30";
# Bandwidth budget configuration
TOTAL_BANDWIDTH_BUDGET = "30000000"; # 30 Mbps in bits per second
SERVICE_BUFFER = "5000000"; # 5 Mbps reserved for other services (bps)
DEFAULT_STREAM_BITRATE = "10000000"; # 10 Mbps fallback when bitrate unknown (bps)
MIN_TORRENT_SPEED = "100"; # KB/s - below this, pause torrents instead
STREAM_BITRATE_HEADROOM = "1.1"; # multiplier per stream for bitrate fluctuations
# Webhook receiver: Jellyfin Webhook plugin POSTs events here to throttle immediately.
WEBHOOK_BIND = "127.0.0.1";
WEBHOOK_PORT = toString webhookPort;
};
};
}

View File

@@ -7,6 +7,8 @@ import sys
import signal
import json
import ipaddress
import threading
from http.server import HTTPServer, BaseHTTPRequestHandler
logging.basicConfig(
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
@@ -34,6 +36,8 @@ class JellyfinQBittorrentMonitor:
default_stream_bitrate=10000000,
min_torrent_speed=100,
stream_bitrate_headroom=1.1,
webhook_port=0,
webhook_bind="127.0.0.1",
):
self.jellyfin_url = jellyfin_url
self.qbittorrent_url = qbittorrent_url
@@ -57,6 +61,12 @@ class JellyfinQBittorrentMonitor:
self.streaming_stop_delay = streaming_stop_delay
self.last_state_change = 0
# Webhook receiver: allows Jellyfin to push events instead of waiting for the poll
self.webhook_port = webhook_port
self.webhook_bind = webhook_bind
self.wake_event = threading.Event()
self.webhook_server = None
# Local network ranges (RFC 1918 private networks + localhost)
self.local_networks = [
ipaddress.ip_network("10.0.0.0/8"),
@@ -79,9 +89,56 @@ class JellyfinQBittorrentMonitor:
def signal_handler(self, signum, frame):
logger.info("Received shutdown signal, cleaning up...")
self.running = False
if self.webhook_server is not None:
# shutdown() blocks until serve_forever returns; run from a thread so we don't deadlock
threading.Thread(target=self.webhook_server.shutdown, daemon=True).start()
self.restore_normal_limits()
sys.exit(0)
def wake(self) -> None:
"""Signal the main loop to re-evaluate state immediately."""
self.wake_event.set()
def sleep_or_wake(self, seconds: float) -> None:
"""Wait up to `seconds`, returning early if a webhook wakes the loop."""
self.wake_event.wait(seconds)
self.wake_event.clear()
def start_webhook_server(self) -> None:
"""Start a background HTTP server that wakes the monitor on any POST."""
if not self.webhook_port:
return
monitor = self
class WebhookHandler(BaseHTTPRequestHandler):
def do_POST(self): # noqa: N802
length = int(self.headers.get("Content-Length", "0") or "0")
body = self.rfile.read(min(length, 65536)) if length else b""
event = "unknown"
try:
if body:
event = json.loads(body).get("NotificationType", "unknown")
except (json.JSONDecodeError, ValueError):
pass
logger.info(f"Webhook received: {event}")
self.send_response(204)
self.end_headers()
monitor.wake()
def log_message(self, format, *args):
return # suppress default access log
self.webhook_server = HTTPServer(
(self.webhook_bind, self.webhook_port), WebhookHandler
)
threading.Thread(
target=self.webhook_server.serve_forever, daemon=True, name="webhook-server"
).start()
logger.info(
f"Webhook receiver listening on http://{self.webhook_bind}:{self.webhook_port}"
)
def check_jellyfin_sessions(self) -> list[dict]:
headers = (
{"X-Emby-Token": self.jellyfin_api_key} if self.jellyfin_api_key else {}
@@ -297,10 +354,14 @@ class JellyfinQBittorrentMonitor:
logger.info(f"Default stream bitrate: {self.default_stream_bitrate} bps")
logger.info(f"Minimum torrent speed: {self.min_torrent_speed} KB/s")
logger.info(f"Stream bitrate headroom: {self.stream_bitrate_headroom}x")
if self.webhook_port:
logger.info(f"Webhook receiver: {self.webhook_bind}:{self.webhook_port}")
signal.signal(signal.SIGINT, self.signal_handler)
signal.signal(signal.SIGTERM, self.signal_handler)
self.start_webhook_server()
while self.running:
try:
self.sync_qbittorrent_state()
@@ -309,7 +370,7 @@ class JellyfinQBittorrentMonitor:
active_streams = self.check_jellyfin_sessions()
except ServiceUnavailable:
logger.warning("Jellyfin unavailable, maintaining current state")
time.sleep(self.check_interval)
self.sleep_or_wake(self.check_interval)
continue
streaming_active = len(active_streams) > 0
@@ -394,13 +455,13 @@ class JellyfinQBittorrentMonitor:
self.current_state = desired_state
self.last_active_streams = active_streams
time.sleep(self.check_interval)
self.sleep_or_wake(self.check_interval)
except KeyboardInterrupt:
break
except Exception as e:
logger.error(f"Unexpected error in monitoring loop: {e}")
time.sleep(self.check_interval)
self.sleep_or_wake(self.check_interval)
self.restore_normal_limits()
logger.info("Monitor stopped")
@@ -421,6 +482,8 @@ if __name__ == "__main__":
default_stream_bitrate = int(os.getenv("DEFAULT_STREAM_BITRATE", "10000000"))
min_torrent_speed = int(os.getenv("MIN_TORRENT_SPEED", "100"))
stream_bitrate_headroom = float(os.getenv("STREAM_BITRATE_HEADROOM", "1.1"))
webhook_port = int(os.getenv("WEBHOOK_PORT", "0"))
webhook_bind = os.getenv("WEBHOOK_BIND", "127.0.0.1")
monitor = JellyfinQBittorrentMonitor(
jellyfin_url=jellyfin_url,
@@ -434,6 +497,8 @@ if __name__ == "__main__":
default_stream_bitrate=default_stream_bitrate,
min_torrent_speed=min_torrent_speed,
stream_bitrate_headroom=stream_bitrate_headroom,
webhook_port=webhook_port,
webhook_bind=webhook_bind,
)
monitor.run()

View File

@@ -0,0 +1,105 @@
{ pkgs, lib }:
let
pluginVersion = "18.0.0.0";
# GUID from the plugin's meta.json; addresses it on /Plugins/<guid>/Configuration.
pluginGuid = "71552a5a-5c5c-4350-a2ae-ebe451a30173";
package = pkgs.stdenvNoCC.mkDerivation {
pname = "jellyfin-plugin-webhook";
version = pluginVersion;
src = pkgs.fetchurl {
url = "https://repo.jellyfin.org/files/plugin/webhook/webhook_${pluginVersion}.zip";
hash = "sha256-LFFojiPnBGl9KJ0xVyPBnCmatcaeVbllRwRkz5Z3dqI=";
};
nativeBuildInputs = [ pkgs.unzip ];
unpackPhase = ''unzip "$src"'';
installPhase = ''
mkdir -p "$out"
cp *.dll meta.json "$out/"
'';
dontFixup = true; # managed .NET assemblies must not be patched
};
# Minimal Handlebars template, base64 encoded. The monitor only needs the POST;
# NotificationType is parsed for the debug log line.
# Decoded: {"NotificationType":"{{NotificationType}}"}
templateB64 = "eyJOb3RpZmljYXRpb25UeXBlIjoie3tOb3RpZmljYXRpb25UeXBlfX0ifQ==";
# Build a PluginConfiguration payload accepted by Jellyfin's JSON deserializer.
# Each webhook is `{ name, uri, notificationTypes }`.
mkConfigJson =
webhooks:
builtins.toJSON {
ServerUrl = "";
GenericOptions = map (w: {
NotificationTypes = w.notificationTypes;
WebhookName = w.name;
WebhookUri = w.uri;
EnableMovies = true;
EnableEpisodes = true;
EnableVideos = true;
EnableWebhook = true;
Template = templateB64;
Headers = [
{
Key = "Content-Type";
Value = "application/json";
}
];
}) webhooks;
};
# Oneshot that POSTs the plugin configuration. Retries past the window
# between Jellyfin API health and plugin registration.
mkConfigureScript =
{ jellyfinUrl, webhooks }:
pkgs.writeShellScript "jellyfin-webhook-configure" ''
set -euo pipefail
export PATH=${
lib.makeBinPath [
pkgs.coreutils
pkgs.curl
]
}
URL=${lib.escapeShellArg jellyfinUrl}
AUTH="Authorization: MediaBrowser Token=\"$(cat "$CREDENTIALS_DIRECTORY/jellyfin-api-key")\""
CONFIG=${lib.escapeShellArg (mkConfigJson webhooks)}
for _ in $(seq 1 120); do curl -sf -o /dev/null "$URL/health" && break; sleep 1; done
curl -sf -o /dev/null "$URL/health"
for _ in $(seq 1 60); do
if printf '%s' "$CONFIG" | curl -sf -X POST \
-H "$AUTH" -H "Content-Type: application/json" --data-binary @- \
"$URL/Plugins/${pluginGuid}/Configuration"; then
echo "Jellyfin webhook plugin configured"; exit 0
fi
sleep 1
done
echo "Failed to configure webhook plugin" >&2; exit 1
'';
# Materialise a writable copy of the plugin. Jellyfin rewrites meta.json at
# runtime, so a read-only nix-store symlink would EACCES.
mkInstallScript =
{ pluginsDir }:
pkgs.writeShellScript "jellyfin-webhook-install" ''
set -euo pipefail
export PATH=${lib.makeBinPath [ pkgs.coreutils ]}
dst=${lib.escapeShellArg "${pluginsDir}/Webhook_${pluginVersion}"}
mkdir -p ${lib.escapeShellArg pluginsDir}
rm -rf "$dst" && mkdir -p "$dst"
cp ${package}/*.dll ${package}/meta.json "$dst/"
chmod u+rw "$dst"/*
'';
in
{
inherit
package
pluginVersion
pluginGuid
mkConfigureScript
mkInstallScript
;
}

View File

@@ -26,6 +26,14 @@
services.caddy.virtualHosts."jellyfin.${service_configs.https.domain}".extraConfig = ''
reverse_proxy :${builtins.toString service_configs.ports.private.jellyfin.port} {
# Disable response buffering for streaming. Caddy's default partial
# buffering delays fMP4-HLS segments and direct-play responses where
# Content-Length is known (so auto-flush doesn't trigger).
flush_interval -1
transport http {
# Localhost: compression wastes CPU re-encoding already-compressed media.
compression off
}
header_up X-Real-IP {remote_host}
header_up X-Forwarded-For {remote_host}
header_up X-Forwarded-Proto {scheme}

103
services/llama-cpp.nix Normal file
View File

@@ -0,0 +1,103 @@
{
pkgs,
service_configs,
config,
inputs,
lib,
utils,
...
}:
let
cfg = config.services.llama-cpp;
modelUrl = "https://huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF/resolve/main/google_gemma-4-E2B-it-IQ2_M.gguf";
modelAlias = lib.removeSuffix ".gguf" (baseNameOf modelUrl);
in
{
imports = [
(lib.mkCaddyReverseProxy {
subdomain = "llm";
port = service_configs.ports.private.llama_cpp.port;
})
];
services.llama-cpp = {
enable = true;
model = toString (
pkgs.fetchurl {
url = modelUrl;
sha256 = "17e869ac54d0e59faa884d5319fc55ad84cd866f50f0b3073fbb25accc875a23";
}
);
port = service_configs.ports.private.llama_cpp.port;
host = "0.0.0.0";
package = lib.optimizePackage (
inputs.llamacpp.packages.${pkgs.system}.vulkan.overrideAttrs (old: {
patches = (old.patches or [ ]) ++ [
];
})
);
extraFlags = [
"-ngl"
"999"
"-c"
"65536"
"-ctk"
"turbo3"
"-ctv"
"turbo3"
"-fa"
"on"
"--api-key-file"
config.age.secrets.llama-cpp-api-key.path
"--metrics"
"--alias"
modelAlias
"-b"
"4096"
"-ub"
"4096"
"--parallel"
"2"
];
};
# have to do this in order to get vulkan to work
systemd.services.llama-cpp.serviceConfig.DynamicUser = lib.mkForce false;
# ANV driver's turbo3 shader compilation exceeds the default 8 MB thread stack.
systemd.services.llama-cpp.serviceConfig.LimitSTACK = lib.mkForce "67108864"; # 64 MB soft+hard
# llama-server tries to create ~/.cache; ProtectSystem=strict + impermanent
# root make /root read-only. Give it a writable cache dir and point HOME there.
systemd.services.llama-cpp.serviceConfig.CacheDirectory = "llama-cpp";
systemd.services.llama-cpp.environment.HOME = "/var/cache/llama-cpp";
# turbo3 KV cache quantization runs a 14-barrier WHT butterfly per 128-element
# workgroup in SET_ROWS. With 4 concurrent slots and batch=4096, the combined
# GPU dispatch can exceed the default i915 CCS engine preempt timeout (7.5s),
# causing GPU HANG -> ErrorDeviceLost. Increase compute engine timeouts.
# Note: batch<4096 is not viable -- GDN chunked mode needs a larger compute
# buffer at smaller batch sizes, exceeding the A380's 6 GB VRAM.
# '+' prefix runs as root regardless of service User=.
systemd.services.llama-cpp.serviceConfig.ExecStartPre = [
"+${pkgs.writeShellScript "set-gpu-compute-timeout" ''
for f in /sys/class/drm/card*/engine/ccs*/preempt_timeout_ms; do
[ -w "$f" ] && echo 30000 > "$f"
done
for f in /sys/class/drm/card*/engine/ccs*/heartbeat_interval_ms; do
[ -w "$f" ] && echo 10000 > "$f"
done
''}"
];
# upstream module hardcodes --log-disable; override ExecStart to keep logs
# so we can see prompt processing progress via journalctl
systemd.services.llama-cpp.serviceConfig.ExecStart = lib.mkForce (
"${cfg.package}/bin/llama-server"
+ " --host ${cfg.host}"
+ " --port ${toString cfg.port}"
+ " -m ${cfg.model}"
+ " ${utils.escapeSystemdExecArgs cfg.extraFlags}"
);
}

View File

@@ -9,7 +9,7 @@
enable = true;
realm = service_configs.https.domain;
use-auth-secret = true;
static-auth-secret = lib.strings.trim (builtins.readFile ../secrets/coturn_static_auth_secret);
static-auth-secret-file = config.age.secrets.coturn-auth-secret.path;
listening-port = service_configs.ports.public.coturn.port;
tls-listening-port = service_configs.ports.public.coturn_tls.port;
no-cli = true;

View File

@@ -0,0 +1,7 @@
{
imports = [
./matrix.nix
./coturn.nix
./livekit.nix
];
}

View File

@@ -3,7 +3,7 @@
...
}:
let
keyFile = ../secrets/livekit_keys;
keyFile = ../../secrets/livekit_keys;
in
{
services.livekit = {

View File

@@ -12,6 +12,10 @@
(lib.serviceFilePerms "continuwuity" [
"Z /var/lib/private/continuwuity 0770 ${config.services.matrix-continuwuity.user} ${config.services.matrix-continuwuity.group}"
])
(lib.mkCaddyReverseProxy {
domain = service_configs.matrix.domain;
port = service_configs.ports.private.matrix.port;
})
];
services.matrix-continuwuity = {
@@ -21,7 +25,7 @@
port = [ service_configs.ports.private.matrix.port ];
server_name = service_configs.https.domain;
allow_registration = true;
registration_token = lib.strings.trim (builtins.readFile ../secrets/matrix_reg_token);
registration_token_file = config.age.secrets.matrix-reg-token.path;
new_user_displayname_suffix = "";
@@ -37,7 +41,7 @@
];
# TURN server config (coturn)
turn_secret = config.services.coturn.static-auth-secret;
turn_secret_file = config.age.secrets.matrix-turn-secret.path;
turn_uris = [
"turn:${service_configs.https.domain}?transport=udp"
"turn:${service_configs.https.domain}?transport=tcp"
@@ -53,10 +57,6 @@
respond /.well-known/matrix/client `{"m.server":{"base_url":"https://${service_configs.matrix.domain}"},"m.homeserver":{"base_url":"https://${service_configs.matrix.domain}"},"org.matrix.msc3575.proxy":{"base_url":"https://${config.services.matrix-continuwuity.settings.global.server_name}"},"org.matrix.msc4143.rtc_foci":[{"type":"livekit","livekit_service_url":"https://${service_configs.livekit.domain}"}]}`
'';
services.caddy.virtualHosts."${service_configs.matrix.domain}".extraConfig = ''
reverse_proxy :${builtins.toString service_configs.ports.private.matrix.port}
'';
# Exact duplicate for federation port
services.caddy.virtualHosts."${service_configs.matrix.domain}:${builtins.toString service_configs.ports.public.matrix_federation.port}".extraConfig =
config.services.caddy.virtualHosts."${service_configs.matrix.domain}".extraConfig;

View File

@@ -37,15 +37,21 @@
servers.${service_configs.minecraft.server_name} = {
enable = true;
package = pkgs.fabricServers.fabric-1_21_11;
package = pkgs.fabricServers.fabric-26_1_2.override { jre_headless = pkgs.openjdk25_headless; };
jvmOpts = lib.concatStringsSep " " [
# Memory
"-Xmx${builtins.toString service_configs.minecraft.memory.heap_size_m}M"
"-Xms${builtins.toString service_configs.minecraft.memory.heap_size_m}M"
# GC
"-XX:+UseZGC"
"-XX:+ZGenerational"
# added in new minecraft version
"-XX:+UseCompactObjectHeaders"
"-XX:+UseStringDeduplication"
# Base JVM optimizations (brucethemoose/Minecraft-Performance-Flags-Benchmarks)
"-XX:+UnlockExperimentalVMOptions"
"-XX:+UnlockDiagnosticVMOptions"
@@ -67,6 +73,7 @@
"-XX:NonProfiledCodeHeapSize=194M"
"-XX:NmethodSweepActivity=1"
"-XX:+UseVectorCmov"
# Large pages (requires vm.nr_hugepages sysctl)
"-XX:+UseLargePages"
"-XX:LargePageSizeInBytes=${builtins.toString service_configs.minecraft.memory.large_page_size_m}M"
@@ -92,71 +99,68 @@
with pkgs;
builtins.attrValues {
FabricApi = fetchurl {
url = "https://cdn.modrinth.com/data/P7dR8mSH/versions/i5tSkVBH/fabric-api-0.141.3%2B1.21.11.jar";
sha512 = "c20c017e23d6d2774690d0dd774cec84c16bfac5461da2d9345a1cd95eee495b1954333c421e3d1c66186284d24a433f6b0cced8021f62e0bfa617d2384d0471";
url = "https://cdn.modrinth.com/data/P7dR8mSH/versions/fm7UYECV/fabric-api-0.145.4%2B26.1.2.jar";
sha512 = "ffd5ef62a745f76cd2e5481252cb7bc67006c809b4f436827d05ea22c01d19279e94a3b24df3d57e127af1cd08440b5de6a92a4ea8f39b2dcbbe1681275564c3";
};
FerriteCore = fetchurl {
url = "https://cdn.modrinth.com/data/uXXizFIs/versions/Ii0gP3D8/ferritecore-8.2.0-fabric.jar";
sha512 = "3210926a82eb32efd9bcebabe2f6c053daf5c4337eebc6d5bacba96d283510afbde646e7e195751de795ec70a2ea44fef77cb54bf22c8e57bb832d6217418869";
};
# No 26.1.2 version available
# FerriteCore = fetchurl {
# url = "https://cdn.modrinth.com/data/uXXizFIs/versions/d5ddUdiB/ferritecore-9.0.0-fabric.jar";
# sha512 = "d81fa97e11784c19d42f89c2f433831d007603dd7193cee45fa177e4a6a9c52b384b198586e04a0f7f63cd996fed713322578bde9a8db57e1188854ae5cbe584";
# };
Lithium = fetchurl {
url = "https://cdn.modrinth.com/data/gvQqBUqZ/versions/Ow7wA0kG/lithium-fabric-0.21.4%2Bmc1.21.11.jar";
sha512 = "f14a5c3d2fad786347ca25083f902139694f618b7c103947f2fd067a7c5ee88a63e1ef8926f7d693ea79ed7d00f57317bae77ef9c2d630bf5ed01ac97a752b94";
url = "https://cdn.modrinth.com/data/gvQqBUqZ/versions/v2xoRvRP/lithium-fabric-0.24.1%2Bmc26.1.2.jar";
sha512 = "8711bc8c6f39be4c8511becb7a68e573ced56777bd691639f2fc62299b35bb4ccd2efe4a39bd9c308084b523be86a5f5c4bf921ab85f7a22bf075d8ea2359621";
};
NoChatReports = fetchurl {
url = "https://cdn.modrinth.com/data/qQyHxfxd/versions/rhykGstm/NoChatReports-FABRIC-1.21.11-v2.18.0.jar";
sha512 = "d2c35cc8d624616f441665aff67c0e366e4101dba243bad25ed3518170942c1a3c1a477b28805cd1a36c44513693b1c55e76bea627d3fced13927a3d67022ccc";
url = "https://cdn.modrinth.com/data/qQyHxfxd/versions/2yrLNE3S/NoChatReports-FABRIC-26.1-v2.19.0.jar";
sha512 = "94d58a1a4cde4e3b1750bdf724e65c5f4ff3436c2532f36a465d497d26bf59f5ac996cddbff8ecdfed770c319aa2f2dcc9c7b2d19a35651c2a7735c5b2124dad";
};
squaremap = fetchurl {
url = "https://cdn.modrinth.com/data/PFb7ZqK6/versions/BW8lMXBi/squaremap-fabric-mc1.21.11-1.3.12.jar";
sha512 = "f62eb791a3f5812eb174565d318f2e6925353f846ef8ac56b4e595f481494e0c281f26b9e9fcfdefa855093c96b735b12f67ee17c07c2477aa7a3439238670d9";
url = "https://cdn.modrinth.com/data/PFb7ZqK6/versions/UBN6MFvH/squaremap-fabric-mc26.1.2-1.3.13.jar";
sha512 = "97bc130184b5d0ddc4ff98a15acef6203459d982e0e2afbd49a2976d546c55a86ef22b841378b51dd782be9b2cfbe4cfa197717f2b7f6800fd8b4ff4df6e564f";
};
scalablelux = fetchurl {
url = "https://cdn.modrinth.com/data/Ps1zyz6x/versions/PV9KcrYQ/ScalableLux-0.1.6%2Bfabric.c25518a-all.jar";
sha512 = "729515c1e75cf8d9cd704f12b3487ddb9664cf9928e7b85b12289c8fbbc7ed82d0211e1851375cbd5b385820b4fedbc3f617038fff5e30b302047b0937042ae7";
url = "https://cdn.modrinth.com/data/Ps1zyz6x/versions/gYbHVCz8/ScalableLux-0.2.0%2Bfabric.2b63825-all.jar";
sha512 = "48565a4d8a1cbd623f0044086d971f2c0cf1c40e1d0b6636a61d41512f4c1c1ddff35879d9dba24b088a670ee254e2d5842d13a30b6d76df23706fa94ea4a58b";
};
c2me = fetchurl {
url = "https://cdn.modrinth.com/data/VSNURh3q/versions/QdLiMUjx/c2me-fabric-mc1.21.11-0.3.7%2Balpha.0.7.jar";
sha512 = "f9543febe2d649a82acd6d5b66189b6a3d820cf24aa503ba493fdb3bbd4e52e30912c4c763fe50006f9a46947ae8cd737d420838c61b93429542573ed67f958e";
url = "https://cdn.modrinth.com/data/VSNURh3q/versions/yrNQQ1AQ/c2me-fabric-mc26.1.2-0.3.7%2Balpha.0.65.jar";
sha512 = "6666ebaa3bfa403e386776590fc845b7c306107d37ebc7b1be3b057893fbf9f933abb2314c171d7fe19c177cf8823cb47fdc32040d34a9704f5ab656dd5d93f8";
};
krypton = fetchurl {
url = "https://cdn.modrinth.com/data/fQEb0iXm/versions/O9LmWYR7/krypton-0.2.10.jar";
sha512 = "4dcd7228d1890ddfc78c99ff284b45f9cf40aae77ef6359308e26d06fa0d938365255696af4cc12d524c46c4886cdcd19268c165a2bf0a2835202fe857da5cab";
};
# No 26.1 version available
# krypton = fetchurl {
# url = "https://cdn.modrinth.com/data/fQEb0iXm/versions/O9LmWYR7/krypton-0.2.10.jar";
# sha512 = "4dcd7228d1890ddfc78c99ff284b45f9cf40aae77ef6359308e26d06fa0d938365255696af4cc12d524c46c4886cdcd19268c165a2bf0a2835202fe857da5cab";
# };
better-fabric-console = fetchurl {
url = "https://cdn.modrinth.com/data/Y8o1j1Sf/versions/6aIKl5wy/better-fabric-console-mc1.21.11-1.2.9.jar";
sha512 = "427247dafd99df202ee10b4bf60ffcbbecbabfadb01c167097ffb5b85670edb811f4d061c2551be816295cbbc6b8ec5ec464c14a6ff41912ef1f6c57b038d320";
};
disconnect-packet-fix = fetchurl {
url = "https://cdn.modrinth.com/data/rd9rKuJT/versions/Gv74xveQ/disconnect-packet-fix-fabric-2.0.0.jar";
sha512 = "1fd6f09a41ce36284e1a8e9def53f3f6834d7201e69e54e24933be56445ba569fbc26278f28300d36926ba92db6f4f9c0ae245d23576aaa790530345587316db";
};
# No 26.1.2 version available
# disconnect-packet-fix = fetchurl {
# url = "https://cdn.modrinth.com/data/rd9rKuJT/versions/x9gVeaTU/disconnect-packet-fix-fabric-2.1.0.jar";
# sha512 = "bf84d02bdcd737706df123e452dd31ef535580fa4ced6af1e4ceea022fef94e4764775253e970b8caa1292e2fa00eb470557f70b290fafdb444479fa801b07a1";
# };
packet-fixer = fetchurl {
url = "https://cdn.modrinth.com/data/c7m1mi73/versions/CUh1DWeO/packetfixer-fabric-3.3.4-1.21.11.jar";
sha512 = "33331b16cb40c5e6fbaade3cacc26f3a0e8fa5805a7186f94d7366a0e14dbeee9de2d2e8c76fa71f5e9dd24eb1c261667c35447e32570ea965ca0f154fdfba0a";
url = "https://cdn.modrinth.com/data/c7m1mi73/versions/M8PqPQr4/packetfixer-fabric-3.3.4-26.1.2.jar";
sha512 = "698020edba2a1fd80bb282bfd4832a00d6447b08eaafbc2e16a8f3bf89e187fc9a622c92dfe94ae140dd485fc0220a86890f12158ec08054e473fef8337829bc";
};
# fork of Modernfix for 1.21.11 (upstream will support 26.1)
# mVUS fork: upstream ModernFix no longer ships Fabric builds
modernfix = fetchurl {
url = "https://cdn.modrinth.com/data/TjSm1wrD/versions/JwSO8JCN/modernfix-5.25.2-build.4.jar";
sha512 = "0d65c05ac0475408c58ef54215714e6301113101bf98bfe4bb2ba949fbfddd98225ac4e2093a5f9206a9e01ba80a931424b237bdfa3b6e178c741ca6f7f8c6a3";
url = "https://cdn.modrinth.com/data/TjSm1wrD/versions/dqQ7mabN/modernfix-5.26.2-build.1.jar";
sha512 = "fbef93c2dabf7bcd0ccd670226dfc4958f7ebe5d8c2b1158e88a65e6954a40f595efd58401d2a3dbb224660dca5952199cf64df29100e7bd39b1b1941290b57b";
};
debugify = fetchurl {
url = "https://cdn.modrinth.com/data/QwxR6Gcd/versions/8Q49lnaU/debugify-1.21.11%2B1.0.jar";
sha512 = "04d82dd33f44ced37045f1f9a54ad4eacd70861ff74a8800f2d2df358579e6cb0ea86a34b0086b3e87026b1a0691dd6594b4fdc49f89106466eea840518beb03";
url = "https://cdn.modrinth.com/data/QwxR6Gcd/versions/mfTTfiKn/debugify-26.1.2%2B1.0.jar";
sha512 = "63db82f2163b9f7fc27ebea999ffcd7a961054435b3ed7d8bf32d905b5f60ce81715916b7fd4e9509dd23703d5492059f3ce7e5f176402f8ed4f985a415553f4";
};
}
);
};

View File

@@ -0,0 +1,8 @@
{
imports = [
./monero.nix
./p2pool.nix
./xmrig.nix
./xmrig-auto-pause.nix
];
}

View File

@@ -4,9 +4,6 @@
lib,
...
}:
let
walletAddress = lib.strings.trim (builtins.readFile ../secrets/xmrig-wallet);
in
{
imports = [
(lib.serviceMountWithZpool "p2pool" service_configs.zpool_ssds [
@@ -20,7 +17,7 @@ in
services.p2pool = {
enable = true;
dataDir = service_configs.p2pool.dataDir;
walletAddress = walletAddress;
walletAddress = service_configs.p2pool.walletAddress;
sidechain = "nano";
host = "127.0.0.1";
rpcPort = service_configs.ports.public.monero_rpc.port;
@@ -36,12 +33,6 @@ in
wants = [ "monero.service" ];
};
# Stop p2pool on UPS battery to conserve power
services.apcupsd.hooks = lib.mkIf config.services.apcupsd.enable {
onbattery = "systemctl stop p2pool";
offbattery = "systemctl start p2pool";
};
networking.firewall.allowedTCPPorts = [
service_configs.ports.public.p2pool_p2p.port
];

View File

@@ -0,0 +1,39 @@
{
config,
lib,
pkgs,
...
}:
lib.mkIf config.services.xmrig.enable {
systemd.services.xmrig-auto-pause = {
description = "Auto-pause xmrig when other services need CPU";
after = [ "xmrig.service" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
ExecStart = "${pkgs.python3}/bin/python3 ${./xmrig-auto-pause.py}";
Restart = "always";
RestartSec = "10s";
NoNewPrivileges = true;
ProtectHome = true;
ProtectSystem = "strict";
PrivateTmp = true;
RestrictAddressFamilies = [
"AF_UNIX" # systemctl talks to systemd over D-Bus unix socket
];
MemoryDenyWriteExecute = true;
StateDirectory = "xmrig-auto-pause";
};
environment = {
POLL_INTERVAL = "3";
GRACE_PERIOD = "15";
# Background services (qbittorrent, bitmagnet, postgresql, etc.) produce
# 15-25% non-nice CPU during normal operation. The stop threshold must
# sit above transient spikes; the resume threshold must be below the
# steady-state floor to avoid restarting xmrig while services are active.
CPU_STOP_THRESHOLD = "40";
CPU_RESUME_THRESHOLD = "10";
STARTUP_COOLDOWN = "10";
STATE_DIR = "/var/lib/xmrig-auto-pause";
};
};
}

View File

@@ -0,0 +1,210 @@
#!/usr/bin/env python3
"""
Auto-pause xmrig when other services need CPU.
Monitors non-nice CPU usage from /proc/stat. Since xmrig runs at Nice=19,
its CPU time lands in the 'nice' column and is excluded from the metric.
When real workload (user + system + irq + softirq) exceeds the stop
threshold, stops xmrig. When it drops below the resume threshold for
GRACE_PERIOD seconds, restarts xmrig.
This replaces per-service pause scripts with a single general-purpose
monitor that handles any CPU-intensive workload (gitea workers, llama-cpp
inference, etc.) without needing to know about specific processes.
Why scheduler priority alone isn't enough:
Nice=19 / SCHED_IDLE only affects which thread gets the next time slice.
RandomX's 2MB-per-thread scratchpad (24MB across 12 threads) pollutes
the shared 32MB L3 cache, and its memory access pattern saturates DRAM
bandwidth. Other services run slower even though they aren't denied CPU
time. The only fix is to stop xmrig entirely when real work is happening.
Hysteresis:
The stop threshold is set higher than the resume threshold to prevent
oscillation. When xmrig runs, its L3 cache pressure makes other processes
appear ~3-8% busier. A single threshold trips on this indirect effect,
causing stop/start thrashing. Separate thresholds break the cycle: the
resume threshold confirms the system is truly idle, while the stop
threshold requires genuine workload above xmrig's indirect pressure.
"""
import os
import subprocess
import sys
import time
POLL_INTERVAL = int(os.environ.get("POLL_INTERVAL", "3"))
GRACE_PERIOD = float(os.environ.get("GRACE_PERIOD", "15"))
# Percentage of total CPU ticks that non-nice processes must use to trigger
# a pause. On a 12-thread system, one fully loaded core ≈ 8.3% of total.
# Default 15% requires roughly two busy cores, which avoids false positives
# from xmrig's L3 cache pressure inflating other processes' apparent CPU.
CPU_STOP_THRESHOLD = float(os.environ.get("CPU_STOP_THRESHOLD", "15"))
# Percentage below which the system is considered idle enough to resume
# mining. Lower than the stop threshold to provide hysteresis.
CPU_RESUME_THRESHOLD = float(os.environ.get("CPU_RESUME_THRESHOLD", "5"))
# After starting xmrig, ignore CPU spikes for this many seconds to let
# RandomX dataset initialization complete (~4s on the target hardware)
# without retriggering a stop.
STARTUP_COOLDOWN = float(os.environ.get("STARTUP_COOLDOWN", "10"))
# Directory for persisting pause state across script restarts. Without
# this, a restart while xmrig is paused loses the paused_by_us flag and
# xmrig stays stopped permanently.
STATE_DIR = os.environ.get("STATE_DIR", "")
_PAUSE_FILE = os.path.join(STATE_DIR, "paused") if STATE_DIR else ""
def log(msg):
print(f"[xmrig-auto-pause] {msg}", file=sys.stderr, flush=True)
def read_cpu_ticks():
"""Read CPU tick counters from /proc/stat.
Returns (total_ticks, real_work_ticks) where real_work excludes the
'nice' column (xmrig) and idle/iowait.
"""
with open("/proc/stat") as f:
parts = f.readline().split()
# cpu user nice system idle iowait irq softirq steal
user, nice, system, idle, iowait, irq, softirq, steal = (
int(x) for x in parts[1:9]
)
total = user + nice + system + idle + iowait + irq + softirq + steal
real_work = user + system + irq + softirq
return total, real_work
def is_active(unit):
"""Check if a systemd unit is currently active."""
result = subprocess.run(
["systemctl", "is-active", "--quiet", unit],
capture_output=True,
)
return result.returncode == 0
def systemctl(action, unit):
result = subprocess.run(
["systemctl", action, unit],
capture_output=True,
text=True,
)
if result.returncode != 0:
log(f"systemctl {action} {unit} failed (rc={result.returncode}): {result.stderr.strip()}")
return result.returncode == 0
def _save_paused(paused):
"""Persist pause flag so a script restart can resume where we left off."""
if not _PAUSE_FILE:
return
try:
if paused:
open(_PAUSE_FILE, "w").close()
else:
os.remove(_PAUSE_FILE)
except OSError:
pass
def _load_paused():
"""Check if a previous instance left xmrig paused."""
if not _PAUSE_FILE:
return False
return os.path.isfile(_PAUSE_FILE)
def main():
paused_by_us = _load_paused()
idle_since = None
started_at = None # monotonic time when we last started xmrig
prev_total = None
prev_work = None
if paused_by_us:
log("Recovered pause state from previous instance")
log(
f"Starting: poll={POLL_INTERVAL}s grace={GRACE_PERIOD}s "
f"stop={CPU_STOP_THRESHOLD}% resume={CPU_RESUME_THRESHOLD}% "
f"cooldown={STARTUP_COOLDOWN}s"
)
while True:
total, work = read_cpu_ticks()
if prev_total is None:
prev_total = total
prev_work = work
time.sleep(POLL_INTERVAL)
continue
dt = total - prev_total
if dt <= 0:
prev_total = total
prev_work = work
time.sleep(POLL_INTERVAL)
continue
real_work_pct = ((work - prev_work) / dt) * 100
prev_total = total
prev_work = work
# Don't act during startup cooldown — RandomX dataset init causes
# a transient CPU spike that would immediately retrigger a stop.
if started_at is not None:
if time.monotonic() - started_at < STARTUP_COOLDOWN:
time.sleep(POLL_INTERVAL)
continue
# Cooldown expired — verify xmrig survived startup. If it
# crashed during init (hugepage failure, pool unreachable, etc.),
# re-enter the pause/retry cycle rather than silently leaving
# xmrig dead.
if not is_active("xmrig.service"):
log("xmrig died during startup cooldown — will retry")
paused_by_us = True
_save_paused(True)
started_at = None
above_stop = real_work_pct > CPU_STOP_THRESHOLD
below_resume = real_work_pct <= CPU_RESUME_THRESHOLD
if above_stop:
idle_since = None
if paused_by_us and is_active("xmrig.service"):
# Something else restarted xmrig (deploy, manual start, etc.)
# while we thought it was stopped. Reset ownership so we can
# manage it again.
log("xmrig was restarted externally while paused — reclaiming")
paused_by_us = False
_save_paused(False)
if not paused_by_us:
# Only claim ownership if xmrig is actually running.
# If something else stopped it (e.g. UPS battery hook),
# don't interfere — we'd wrongly restart it later.
if is_active("xmrig.service"):
log(f"Real workload detected ({real_work_pct:.1f}% CPU) — stopping xmrig")
if systemctl("stop", "xmrig.service"):
paused_by_us = True
_save_paused(True)
elif paused_by_us:
if below_resume:
if idle_since is None:
idle_since = time.monotonic()
elif time.monotonic() - idle_since >= GRACE_PERIOD:
log(f"Workload ended ({real_work_pct:.1f}% CPU) past grace period — starting xmrig")
if systemctl("start", "xmrig.service"):
paused_by_us = False
_save_paused(False)
started_at = time.monotonic()
idle_since = None
else:
# Between thresholds — not idle enough to resume.
idle_since = None
time.sleep(POLL_INTERVAL)
if __name__ == "__main__":
main()

View File

@@ -11,7 +11,7 @@ in
{
services.xmrig = {
enable = true;
package = pkgs.xmrig;
package = lib.optimizePackage pkgs.xmrig;
settings = {
autosave = true;

View File

@@ -0,0 +1,6 @@
{
imports = [
./ntfy.nix
./ntfy-alerts.nix
];
}

View File

@@ -1,5 +1,10 @@
{ config, service_configs, ... }:
{
config,
lib,
service_configs,
...
}:
lib.mkIf config.services.ntfy-sh.enable {
services.ntfyAlerts = {
enable = true;
serverUrl = "https://${service_configs.ntfy.domain}";

View File

@@ -12,6 +12,10 @@
(lib.serviceFilePerms "ntfy-sh" [
"Z /var/lib/private/ntfy-sh 0700 ${config.services.ntfy-sh.user} ${config.services.ntfy-sh.group}"
])
(lib.mkCaddyReverseProxy {
domain = service_configs.ntfy.domain;
port = service_configs.ports.private.ntfy.port;
})
];
services.ntfy-sh = {
@@ -27,8 +31,4 @@
};
};
services.caddy.virtualHosts."${service_configs.ntfy.domain}".extraConfig = ''
reverse_proxy :${builtins.toString service_configs.ports.private.ntfy.port}
'';
}

View File

@@ -6,6 +6,11 @@
inputs,
...
}:
let
categoriesFile = pkgs.writeText "categories.json" (
builtins.toJSON (lib.mapAttrs (_: path: { save_path = path; }) service_configs.torrent.categories)
);
in
{
imports = [
(lib.serviceMountWithZpool "qbittorrent" service_configs.zpool_hdds [
@@ -18,10 +23,18 @@
(lib.serviceFilePerms "qbittorrent" [
# 0770: group (media) needs write to delete files during upgrades —
# Radarr/Sonarr must unlink the old file before placing the new one.
"Z ${config.services.qbittorrent.serverConfig.Preferences.Downloads.SavePath} 0770 ${config.services.qbittorrent.user} ${service_configs.media_group}"
# Non-recursive (z not Z): UMask=0007 ensures new files get correct perms.
# A recursive Z rule would walk millions of files on the HDD pool at every boot.
"z ${config.services.qbittorrent.serverConfig.Preferences.Downloads.SavePath} 0770 ${config.services.qbittorrent.user} ${service_configs.media_group}"
"z ${config.services.qbittorrent.serverConfig.Preferences.Downloads.TempPath} 0700 ${config.services.qbittorrent.user} ${config.services.qbittorrent.group}"
"Z ${config.services.qbittorrent.profileDir} 0700 ${config.services.qbittorrent.user} ${config.services.qbittorrent.group}"
])
(lib.mkCaddyReverseProxy {
subdomain = "torrent";
port = service_configs.ports.private.torrent.port;
auth = true;
vpn = true;
})
];
services.qbittorrent = {
@@ -135,10 +148,50 @@
UMask = lib.mkForce "0007";
};
services.caddy.virtualHosts."torrent.${service_configs.https.domain}".extraConfig = ''
import ${config.age.secrets.caddy_auth.path}
reverse_proxy ${config.vpnNamespaces.wg.namespaceAddress}:${builtins.toString config.services.qbittorrent.webuiPort}
'';
# Pre-define qBittorrent categories with explicit save paths so every
# torrent routes to its category directory instead of the SavePath root.
systemd.tmpfiles.settings.qbittorrent-categories = {
"${config.services.qbittorrent.profileDir}/qBittorrent/config/categories.json"."L+" = {
argument = "${categoriesFile}";
user = config.services.qbittorrent.user;
group = config.services.qbittorrent.group;
mode = "1400";
};
};
# Ensure category directories exist with correct ownership before first use.
systemd.tmpfiles.rules = lib.mapAttrsToList (
_: path: "d ${path} 0770 ${config.services.qbittorrent.user} ${service_configs.media_group} -"
) service_configs.torrent.categories;
# Periodically checkpoint qBittorrent's SQLite WAL (Write-Ahead Log).
# qBittorrent holds a read transaction open for its entire lifetime,
# preventing SQLite's auto-checkpoint from running. The WAL grows
# unbounded (observed: 405 MB) and must be replayed on next startup,
# causing 10+ minute "internal preparations" hangs.
# A second sqlite3 connection can checkpoint concurrently and safely.
# See: https://github.com/qbittorrent/qBittorrent/issues/20433
systemd.services.qbittorrent-wal-checkpoint = {
description = "Checkpoint qBittorrent SQLite WAL";
after = [ "qbittorrent.service" ];
requires = [ "qbittorrent.service" ];
serviceConfig = {
Type = "oneshot";
ExecStart = "${pkgs.sqlite}/bin/sqlite3 ${config.services.qbittorrent.profileDir}/qBittorrent/data/torrents.db 'PRAGMA wal_checkpoint(TRUNCATE);'";
User = config.services.qbittorrent.user;
Group = config.services.qbittorrent.group;
};
};
systemd.timers.qbittorrent-wal-checkpoint = {
description = "Periodically checkpoint qBittorrent SQLite WAL";
wantedBy = [ "timers.target" ];
timerConfig = {
OnUnitActiveSec = "4h";
OnBootSec = "30min";
RandomizedDelaySec = "10min";
};
};
users.users.${config.services.qbittorrent.user}.extraGroups = [
service_configs.media_group

View File

@@ -19,6 +19,10 @@
"Z ${service_configs.slskd.downloads} 0750 ${config.services.slskd.user} music"
"Z ${service_configs.slskd.incomplete} 0750 ${config.services.slskd.user} music"
])
(lib.mkCaddyReverseProxy {
subdomain = "soulseek";
port = service_configs.ports.private.soulseek_web.port;
})
];
users.groups."music" = { };
@@ -58,11 +62,6 @@
users.users.${config.services.jellyfin.user}.extraGroups = [ "music" ];
users.users.${username}.extraGroups = [ "music" ];
# doesn't work with auth????
services.caddy.virtualHosts."soulseek.${service_configs.https.domain}".extraConfig = ''
reverse_proxy :${builtins.toString config.services.slskd.settings.web.port}
'';
networking.firewall.allowedTCPPorts = [
service_configs.ports.public.soulseek_listen.port
];

View File

@@ -31,5 +31,8 @@
# used for deploying configs to server
users.users.root.openssh.authorizedKeys.keys =
config.users.users.${username}.openssh.authorizedKeys.keys;
config.users.users.${username}.openssh.authorizedKeys.keys
++ [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIC5ZYN6idL/w/mUIfPOH1i+Q/SQXuzAMQUEuWpipx1Pc ci-deploy@muffin"
];
}

View File

@@ -17,6 +17,11 @@
"Z ${service_configs.syncthing.signalBackupDir} 0750 ${config.services.syncthing.user} ${config.services.syncthing.group}"
"Z ${service_configs.syncthing.grayjayBackupDir} 0750 ${config.services.syncthing.user} ${config.services.syncthing.group}"
])
(lib.mkCaddyReverseProxy {
subdomain = "syncthing";
port = service_configs.ports.private.syncthing_gui.port;
auth = true;
})
];
services.syncthing = {
@@ -49,9 +54,4 @@
];
};
services.caddy.virtualHosts."syncthing.${service_configs.https.domain}".extraConfig = ''
import ${config.age.secrets.caddy_auth.path}
reverse_proxy :${toString service_configs.ports.private.syncthing_gui.port}
'';
}

27
services/trilium.nix Normal file
View File

@@ -0,0 +1,27 @@
{
config,
pkgs,
service_configs,
lib,
...
}:
{
imports = [
(lib.serviceMountWithZpool "trilium-server" service_configs.zpool_ssds [
(service_configs.services_dir + "/trilium")
])
(lib.mkCaddyReverseProxy {
subdomain = "notes";
port = service_configs.ports.private.trilium.port;
auth = true;
})
];
services.trilium-server = {
enable = true;
port = service_configs.ports.private.trilium.port;
host = "127.0.0.1";
dataDir = service_configs.trilium.dataDir;
};
}

View File

@@ -30,7 +30,7 @@ let
{ config, pkgs, ... }:
{
imports = [
(import ../services/jellyfin.nix {
(import ../services/jellyfin/jellyfin.nix {
inherit config pkgs;
lib = testLib;
service_configs = testServiceConfigs;
@@ -107,7 +107,7 @@ pkgs.testers.runNixOSTest {
server.wait_for_unit("jellyfin.service")
server.wait_for_unit("fail2ban.service")
server.wait_for_open_port(8096)
server.wait_until_succeeds("curl -sf http://localhost:8096/health | grep -q Healthy", timeout=60)
server.wait_until_succeeds("curl -sf http://localhost:8096/health | grep -q Healthy", timeout=120)
time.sleep(2)
# Wait for Jellyfin to create real log files and reload fail2ban

60
tests/gitea-runner.nix Normal file
View File

@@ -0,0 +1,60 @@
{
config,
lib,
pkgs,
...
}:
pkgs.testers.runNixOSTest {
name = "gitea-runner";
nodes.machine =
{ pkgs, ... }:
{
services.gitea = {
enable = true;
database.type = "sqlite3";
settings = {
server = {
HTTP_PORT = 3000;
ROOT_URL = "http://localhost:3000";
DOMAIN = "localhost";
};
actions.ENABLED = true;
service.DISABLE_REGISTRATION = true;
};
};
specialisation.runner = {
inheritParentConfig = true;
configuration.services.gitea-actions-runner.instances.test = {
enable = true;
name = "ci";
url = "http://localhost:3000";
labels = [ "native:host" ];
tokenFile = "/var/lib/gitea/runner_token";
};
};
};
testScript = ''
start_all()
machine.wait_for_unit("gitea.service")
machine.wait_for_open_port(3000)
# Generate runner token
machine.succeed(
"su -l gitea -s /bin/sh -c '${pkgs.gitea}/bin/gitea actions generate-runner-token --work-path /var/lib/gitea' | tail -1 | sed 's/^/TOKEN=/' > /var/lib/gitea/runner_token"
)
# Switch to runner specialisation
machine.succeed(
"/run/current-system/specialisation/runner/bin/switch-to-configuration test"
)
# Start the runner (specialisation switch doesn't auto-start new services)
machine.succeed("systemctl start gitea-runner-test.service")
machine.wait_for_unit("gitea-runner-test.service")
machine.succeed("sleep 5")
machine.succeed("test -f /var/lib/gitea-runner/test/.runner")
'';
}

View File

@@ -0,0 +1,190 @@
{
lib,
pkgs,
...
}:
let
jfLib = import ./jellyfin-test-lib.nix { inherit pkgs lib; };
mockGrafana = ./mock-grafana-server.py;
script = ../services/grafana/jellyfin-annotations.py;
python = pkgs.python3;
in
pkgs.testers.runNixOSTest {
name = "jellyfin-annotations";
nodes.machine =
{ pkgs, ... }:
{
imports = [ jfLib.jellyfinTestConfig ];
environment.systemPackages = [ pkgs.python3 ];
};
testScript = ''
import json
import time
import importlib.util
_spec = importlib.util.spec_from_file_location("jf_helpers", "${jfLib.helpers}")
assert _spec and _spec.loader
_jf = importlib.util.module_from_spec(_spec)
_spec.loader.exec_module(_jf)
setup_jellyfin = _jf.setup_jellyfin
jellyfin_api = _jf.jellyfin_api
GRAFANA_PORT = 13000
ANNOTS_FILE = "/tmp/annotations.json"
STATE_FILE = "/tmp/annotations-state.json"
CREDS_DIR = "/tmp/test-creds"
PYTHON = "${python}/bin/python3"
MOCK_GRAFANA = "${mockGrafana}"
SCRIPT = "${script}"
auth_header = 'MediaBrowser Client="Infuse", DeviceId="test-dev-1", Device="iPhone", Version="1.0"'
auth_header2 = 'MediaBrowser Client="Jellyfin Web", DeviceId="test-dev-2", Device="Chrome", Version="1.0"'
def read_annotations():
out = machine.succeed(f"cat {ANNOTS_FILE} 2>/dev/null || echo '[]'")
return json.loads(out.strip())
start_all()
token, user_id, movie_id, media_source_id = setup_jellyfin(
machine, retry, auth_header,
"${jfLib.payloads.auth}", "${jfLib.payloads.empty}",
)
with subtest("Setup mock Grafana and credentials"):
machine.succeed(f"mkdir -p {CREDS_DIR}")
machine.succeed(f"echo '{token}' > {CREDS_DIR}/jellyfin-api-key")
machine.succeed(f"echo '[]' > {ANNOTS_FILE}")
machine.succeed(
f"systemd-run --unit=mock-grafana {PYTHON} {MOCK_GRAFANA} {GRAFANA_PORT} {ANNOTS_FILE}"
)
machine.wait_until_succeeds(
f"curl -sf -X POST http://127.0.0.1:{GRAFANA_PORT}/api/annotations "
f"-H 'Content-Type: application/json' -d '{{\"text\":\"ping\",\"tags\":[]}}' | grep -q id",
timeout=10,
)
machine.succeed(f"echo '[]' > {ANNOTS_FILE}")
with subtest("Start annotation service"):
machine.succeed(
f"systemd-run --unit=annotations-svc "
f"--setenv=JELLYFIN_URL=http://127.0.0.1:8096 "
f"--setenv=GRAFANA_URL=http://127.0.0.1:{GRAFANA_PORT} "
f"--setenv=CREDENTIALS_DIRECTORY={CREDS_DIR} "
f"--setenv=STATE_FILE={STATE_FILE} "
f"--setenv=POLL_INTERVAL=3 "
f"{PYTHON} {SCRIPT}"
)
time.sleep(2)
with subtest("No annotations when no streams active"):
time.sleep(4)
annots = read_annotations()
assert annots == [], f"Expected no annotations, got: {annots}"
with subtest("Annotation created when playback starts"):
playback_start = json.dumps({
"ItemId": movie_id,
"MediaSourceId": media_source_id,
"PlaySessionId": "test-play-1",
"CanSeek": True,
"IsPaused": False,
})
machine.succeed(
f"curl -sf -X POST 'http://localhost:8096/Sessions/Playing' "
f"-d '{playback_start}' -H 'Content-Type:application/json' "
f"-H 'X-Emby-Authorization:{auth_header}, Token={token}'"
)
machine.wait_until_succeeds(
f"cat {ANNOTS_FILE} | python3 -c \"import sys,json; a=json.load(sys.stdin); exit(0 if a else 1)\"",
timeout=15,
)
annots = read_annotations()
assert len(annots) == 1, f"Expected 1 annotation, got: {annots}"
text = annots[0]["text"]
assert "jellyfin" in annots[0].get("tags", []), f"Missing jellyfin tag: {annots[0]}"
assert "Test Movie" in text, f"Missing title in: {text}"
assert "Infuse" in text, f"Missing client in: {text}"
assert "iPhone" in text, f"Missing device in: {text}"
assert "timeEnd" not in annots[0], f"timeEnd should not be set yet: {annots[0]}"
with subtest("Annotation closed when playback stops"):
playback_stop = json.dumps({
"ItemId": movie_id,
"MediaSourceId": media_source_id,
"PlaySessionId": "test-play-1",
"PositionTicks": 50000000,
})
machine.succeed(
f"curl -sf -X POST 'http://localhost:8096/Sessions/Playing/Stopped' "
f"-d '{playback_stop}' -H 'Content-Type:application/json' "
f"-H 'X-Emby-Authorization:{auth_header}, Token={token}'"
)
machine.wait_until_succeeds(
f"cat {ANNOTS_FILE} | python3 -c \"import sys,json; a=json.load(sys.stdin); exit(0 if a and 'timeEnd' in a[0] else 1)\"",
timeout=15,
)
annots = read_annotations()
assert len(annots) == 1, f"Expected 1 annotation, got: {annots}"
assert "timeEnd" in annots[0], f"timeEnd should be set: {annots[0]}"
assert annots[0]["timeEnd"] > annots[0]["time"], "timeEnd should be after time"
with subtest("Multiple concurrent streams each get their own annotation"):
machine.succeed(f"echo '[]' > {ANNOTS_FILE}")
auth_result2 = json.loads(machine.succeed(
f"curl -sf -X POST 'http://localhost:8096/Users/AuthenticateByName' "
f"-d '@${jfLib.payloads.auth}' -H 'Content-Type:application/json' "
f"-H 'X-Emby-Authorization:{auth_header2}'"
))
token2 = auth_result2["AccessToken"]
playback1 = json.dumps({
"ItemId": movie_id,
"MediaSourceId": media_source_id,
"PlaySessionId": "test-play-multi-1",
"CanSeek": True,
"IsPaused": False,
})
machine.succeed(
f"curl -sf -X POST 'http://localhost:8096/Sessions/Playing' "
f"-d '{playback1}' -H 'Content-Type:application/json' "
f"-H 'X-Emby-Authorization:{auth_header}, Token={token}'"
)
playback2 = json.dumps({
"ItemId": movie_id,
"MediaSourceId": media_source_id,
"PlaySessionId": "test-play-multi-2",
"CanSeek": True,
"IsPaused": False,
})
machine.succeed(
f"curl -sf -X POST 'http://localhost:8096/Sessions/Playing' "
f"-d '{playback2}' -H 'Content-Type:application/json' "
f"-H 'X-Emby-Authorization:{auth_header2}, Token={token2}'"
)
machine.wait_until_succeeds(
f"cat {ANNOTS_FILE} | python3 -c \"import sys,json; a=json.load(sys.stdin); exit(0 if len(a)==2 else 1)\"",
timeout=15,
)
annots = read_annotations()
assert len(annots) == 2, f"Expected 2 annotations, got: {annots}"
with subtest("State survives service restart (no duplicate annotations)"):
machine.succeed("systemctl stop annotations-svc || true")
time.sleep(1)
machine.succeed(
f"systemd-run --unit=annotations-svc-2 "
f"--setenv=JELLYFIN_URL=http://127.0.0.1:8096 "
f"--setenv=GRAFANA_URL=http://127.0.0.1:{GRAFANA_PORT} "
f"--setenv=CREDENTIALS_DIRECTORY={CREDS_DIR} "
f"--setenv=STATE_FILE={STATE_FILE} "
f"--setenv=POLL_INTERVAL=3 "
f"{PYTHON} {SCRIPT}"
)
time.sleep(6)
annots = read_annotations()
assert len(annots) == 2, f"Restart should not create duplicates, got: {annots}"
'';
}

View File

@@ -5,9 +5,21 @@
...
}:
let
payloads = {
auth = pkgs.writeText "auth.json" (builtins.toJSON { Username = "jellyfin"; });
empty = pkgs.writeText "empty.json" (builtins.toJSON { });
jfLib = import ./jellyfin-test-lib.nix { inherit pkgs lib; };
webhookPlugin = import ../services/jellyfin/jellyfin-webhook-plugin.nix { inherit pkgs lib; };
configureWebhook = webhookPlugin.mkConfigureScript {
jellyfinUrl = "http://localhost:8096";
webhooks = [
{
name = "qBittorrent Monitor";
uri = "http://127.0.0.1:9898/";
notificationTypes = [
"PlaybackStart"
"PlaybackProgress"
"PlaybackStop"
];
}
];
};
in
pkgs.testers.runNixOSTest {
@@ -18,11 +30,10 @@ pkgs.testers.runNixOSTest {
{ ... }:
{
imports = [
jfLib.jellyfinTestConfig
inputs.vpn-confinement.nixosModules.default
];
services.jellyfin.enable = true;
# Real qBittorrent service
services.qbittorrent = {
enable = true;
@@ -56,11 +67,6 @@ pkgs.testers.runNixOSTest {
};
};
environment.systemPackages = with pkgs; [
curl
ffmpeg
];
virtualisation.diskSize = 3 * 1024;
networking.firewall.allowedTCPPorts = [
8096
8080
@@ -78,11 +84,30 @@ pkgs.testers.runNixOSTest {
}
];
# Create directories for qBittorrent
# Create directories for qBittorrent.
systemd.tmpfiles.rules = [
"d /var/lib/qbittorrent/downloads 0755 qbittorrent qbittorrent"
"d /var/lib/qbittorrent/incomplete 0755 qbittorrent qbittorrent"
];
# Install the Jellyfin Webhook plugin before Jellyfin starts, mirroring
# the production module. Jellyfin rewrites meta.json at runtime so a
# read-only nix-store symlink would fail — we materialise a writable copy.
systemd.services."jellyfin-webhook-install" = {
description = "Install Jellyfin Webhook plugin files";
before = [ "jellyfin.service" ];
wantedBy = [ "jellyfin.service" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
User = "jellyfin";
Group = "jellyfin";
UMask = "0077";
ExecStart = webhookPlugin.mkInstallScript {
pluginsDir = "/var/lib/jellyfin/plugins";
};
};
};
};
# Public test IP (RFC 5737 TEST-NET-3) so Jellyfin sees it as external
@@ -106,20 +131,17 @@ pkgs.testers.runNixOSTest {
testScript = ''
import json
import time
from urllib.parse import urlencode
import importlib.util
_spec = importlib.util.spec_from_file_location("jf_helpers", "${jfLib.helpers}")
assert _spec and _spec.loader
_jf = importlib.util.module_from_spec(_spec)
_spec.loader.exec_module(_jf)
setup_jellyfin = _jf.setup_jellyfin
jellyfin_api = _jf.jellyfin_api
auth_header = 'MediaBrowser Client="NixOS Test", DeviceId="test-1337", Device="TestDevice", Version="1.0"'
def api_get(path, token=None):
header = auth_header + (f", Token={token}" if token else "")
return f"curl -sf 'http://server:8096{path}' -H 'X-Emby-Authorization:{header}'"
def api_post(path, json_file=None, token=None):
header = auth_header + (f", Token={token}" if token else "")
if json_file:
return f"curl -sf -X POST 'http://server:8096{path}' -d '@{json_file}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{header}'"
return f"curl -sf -X POST 'http://server:8096{path}' -H 'X-Emby-Authorization:{header}'"
def is_throttled():
return server.succeed("curl -s http://localhost:8080/api/v2/transfer/speedLimitsMode").strip() == "1"
@@ -137,61 +159,19 @@ pkgs.testers.runNixOSTest {
return False
return all(t["state"].startswith("stopped") for t in torrents)
movie_id: str = ""
media_source_id: str = ""
start_all()
server.wait_for_unit("jellyfin.service")
server.wait_for_open_port(8096)
server.wait_until_succeeds("curl -sf http://localhost:8096/health | grep -q Healthy", timeout=60)
server.wait_for_unit("qbittorrent.service")
server.wait_for_open_port(8080)
# Wait for qBittorrent WebUI to be responsive
server.wait_until_succeeds("curl -sf http://localhost:8080/api/v2/app/version", timeout=30)
with subtest("Complete Jellyfin setup wizard"):
server.wait_until_succeeds(api_get("/Startup/Configuration"))
server.succeed(api_get("/Startup/FirstUser"))
server.succeed(api_post("/Startup/Complete"))
with subtest("Authenticate and get token"):
auth_result = json.loads(server.succeed(api_post("/Users/AuthenticateByName", "${payloads.auth}")))
token = auth_result["AccessToken"]
user_id = auth_result["User"]["Id"]
with subtest("Create test video library"):
tempdir = server.succeed("mktemp -d -p /var/lib/jellyfin").strip()
server.succeed(f"chmod 755 '{tempdir}'")
server.succeed(f"ffmpeg -f lavfi -i testsrc2=duration=5 '{tempdir}/Test Movie (2024) [1080p].mkv'")
add_folder_query = urlencode({
"name": "Test Library",
"collectionType": "Movies",
"paths": tempdir,
"refreshLibrary": "true",
})
server.succeed(api_post(f"/Library/VirtualFolders?{add_folder_query}", "${payloads.empty}", token))
def is_library_ready(_):
folders = json.loads(server.succeed(api_get("/Library/VirtualFolders", token)))
return all(f.get("RefreshStatus") == "Idle" for f in folders)
retry(is_library_ready, timeout=60)
def get_movie(_):
global movie_id, media_source_id
items = json.loads(server.succeed(api_get(f"/Users/{user_id}/Items?IncludeItemTypes=Movie&Recursive=true", token)))
if items["TotalRecordCount"] > 0:
movie_id = items["Items"][0]["Id"]
item_info = json.loads(server.succeed(api_get(f"/Users/{user_id}/Items/{movie_id}", token)))
media_source_id = item_info["MediaSources"][0]["Id"]
return True
return False
retry(get_movie, timeout=60)
token, user_id, movie_id, media_source_id = setup_jellyfin(
server, retry, auth_header,
"${jfLib.payloads.auth}", "${jfLib.payloads.empty}",
)
with subtest("Start monitor service"):
python = "${pkgs.python3.withPackages (ps: [ ps.requests ])}/bin/python"
monitor = "${../services/jellyfin-qbittorrent-monitor.py}"
monitor = "${../services/jellyfin/jellyfin-qbittorrent-monitor.py}"
server.succeed(f"""
systemd-run --unit=monitor-test \
--setenv=JELLYFIN_URL=http://localhost:8096 \
@@ -214,12 +194,12 @@ pkgs.testers.runNixOSTest {
server_ip = "192.168.1.1"
with subtest("Client authenticates from external network"):
auth_cmd = f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth}'"
auth_cmd = f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${jfLib.payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth}'"
client_auth_result = json.loads(client.succeed(auth_cmd))
client_token = client_auth_result["AccessToken"]
with subtest("Second client authenticates from external network"):
auth_cmd2 = f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth2}'"
auth_cmd2 = f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${jfLib.payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth2}'"
client_auth_result2 = json.loads(client.succeed(auth_cmd2))
client_token2 = client_auth_result2["AccessToken"]
@@ -430,7 +410,7 @@ pkgs.testers.runNixOSTest {
with subtest("Local playback does NOT trigger throttling"):
local_auth = 'MediaBrowser Client="Local Client", DeviceId="local-1111", Device="LocalDevice", Version="1.0"'
local_auth_result = json.loads(server.succeed(
f"curl -sf -X POST 'http://localhost:8096/Users/AuthenticateByName' -d '@${payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{local_auth}'"
f"curl -sf -X POST 'http://localhost:8096/Users/AuthenticateByName' -d '@${jfLib.payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{local_auth}'"
))
local_token = local_auth_result["AccessToken"]
@@ -448,6 +428,97 @@ pkgs.testers.runNixOSTest {
local_playback["PositionTicks"] = 50000000
server.succeed(f"curl -sf -X POST 'http://localhost:8096/Sessions/Playing/Stopped' -d '{json.dumps(local_playback)}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{local_auth}, Token={local_token}'")
# === WEBHOOK TESTS ===
#
# Configure the Jellyfin Webhook plugin to target the monitor, then verify
# the real Jellyfin plugin monitor path reacts faster than any possible
# poll. CHECK_INTERVAL=30 rules out polling as the cause.
WEBHOOK_PORT = 9898
WEBHOOK_CREDS = "/tmp/webhook-creds"
# Start a webhook-enabled monitor with long poll interval.
server.succeed("systemctl stop monitor-test || true")
time.sleep(1)
server.succeed(f"""
systemd-run --unit=monitor-webhook \
--setenv=JELLYFIN_URL=http://localhost:8096 \
--setenv=JELLYFIN_API_KEY={token} \
--setenv=QBITTORRENT_URL=http://localhost:8080 \
--setenv=CHECK_INTERVAL=30 \
--setenv=STREAMING_START_DELAY=1 \
--setenv=STREAMING_STOP_DELAY=1 \
--setenv=TOTAL_BANDWIDTH_BUDGET=50000000 \
--setenv=SERVICE_BUFFER=2000000 \
--setenv=DEFAULT_STREAM_BITRATE=10000000 \
--setenv=MIN_TORRENT_SPEED=100 \
--setenv=WEBHOOK_PORT={WEBHOOK_PORT} \
--setenv=WEBHOOK_BIND=127.0.0.1 \
{python} {monitor}
""")
server.wait_until_succeeds(f"ss -ltn | grep -q ':{WEBHOOK_PORT}'", timeout=15)
time.sleep(2)
assert not is_throttled(), "Should start unthrottled"
# Drop the admin token where the configure script expects it (production uses agenix).
server.succeed(f"mkdir -p {WEBHOOK_CREDS} && echo '{token}' > {WEBHOOK_CREDS}/jellyfin-api-key")
server.succeed(
f"systemd-run --wait --unit=webhook-configure-test "
f"--setenv=CREDENTIALS_DIRECTORY={WEBHOOK_CREDS} "
f"${configureWebhook}"
)
with subtest("Real PlaybackStart event throttles via the plugin"):
playback_start = {
"ItemId": movie_id,
"MediaSourceId": media_source_id,
"PlaySessionId": "test-plugin-start",
"CanSeek": True,
"IsPaused": False,
}
start_cmd = f"curl -sf -X POST 'http://{server_ip}:8096/Sessions/Playing' -d '{json.dumps(playback_start)}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth}, Token={client_token}'"
client.succeed(start_cmd)
server.wait_until_succeeds(
"curl -sf http://localhost:8080/api/v2/transfer/speedLimitsMode | grep -q '^1$'",
timeout=5,
)
# Let STREAMING_STOP_DELAY (1s) elapse so the upcoming stop is not swallowed by hysteresis.
time.sleep(2)
with subtest("Real PlaybackStop event unthrottles via the plugin"):
playback_stop = {
"ItemId": movie_id,
"MediaSourceId": media_source_id,
"PlaySessionId": "test-plugin-start",
"PositionTicks": 50000000,
}
stop_cmd = f"curl -sf -X POST 'http://{server_ip}:8096/Sessions/Playing/Stopped' -d '{json.dumps(playback_stop)}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth}, Token={client_token}'"
client.succeed(stop_cmd)
server.wait_until_succeeds(
"curl -sf http://localhost:8080/api/v2/transfer/speedLimitsMode | grep -q '^0$'",
timeout=10,
)
# Restore fast-polling monitor for the service-restart tests below.
server.succeed("systemctl stop monitor-webhook || true")
time.sleep(1)
server.succeed(f"""
systemd-run --unit=monitor-test \
--setenv=JELLYFIN_URL=http://localhost:8096 \
--setenv=JELLYFIN_API_KEY={token} \
--setenv=QBITTORRENT_URL=http://localhost:8080 \
--setenv=CHECK_INTERVAL=1 \
--setenv=STREAMING_START_DELAY=1 \
--setenv=STREAMING_STOP_DELAY=1 \
--setenv=TOTAL_BANDWIDTH_BUDGET=50000000 \
--setenv=SERVICE_BUFFER=2000000 \
--setenv=DEFAULT_STREAM_BITRATE=10000000 \
--setenv=MIN_TORRENT_SPEED=100 \
{python} {monitor}
""")
time.sleep(2)
# === SERVICE RESTART TESTS ===
with subtest("qBittorrent restart during throttled state re-applies throttling"):
@@ -527,11 +598,11 @@ pkgs.testers.runNixOSTest {
# Re-authenticate (old token invalid after restart)
client_auth_result = json.loads(client.succeed(
f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth}'"
f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${jfLib.payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth}'"
))
client_token = client_auth_result["AccessToken"]
client_auth_result2 = json.loads(client.succeed(
f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth2}'"
f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${jfLib.payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth2}'"
))
client_token2 = client_auth_result2["AccessToken"]
@@ -542,11 +613,11 @@ pkgs.testers.runNixOSTest {
with subtest("Monitor recovers after Jellyfin temporary unavailability"):
# Re-authenticate with fresh token
client_auth_result = json.loads(client.succeed(
f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth}'"
f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${jfLib.payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth}'"
))
client_token = client_auth_result["AccessToken"]
client_auth_result2 = json.loads(client.succeed(
f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth2}'"
f"curl -sf -X POST 'http://{server_ip}:8096/Users/AuthenticateByName' -d '@${jfLib.payloads.auth}' -H 'Content-Type:application/json' -H 'X-Emby-Authorization:{client_auth2}'"
))
client_token2 = client_auth_result2["AccessToken"]

View File

@@ -0,0 +1,20 @@
{ pkgs, lib }:
{
payloads = {
auth = pkgs.writeText "auth.json" (builtins.toJSON { Username = "jellyfin"; });
empty = pkgs.writeText "empty.json" (builtins.toJSON { });
};
helpers = ./jellyfin-test-lib.py;
jellyfinTestConfig =
{ pkgs, ... }:
{
services.jellyfin.enable = true;
environment.systemPackages = with pkgs; [
curl
ffmpeg
];
virtualisation.diskSize = lib.mkDefault (3 * 1024);
};
}

View File

@@ -0,0 +1,90 @@
import json
from urllib.parse import urlencode
def jellyfin_api(machine, method, path, auth_header, token=None, data_file=None, data=None):
hdr = auth_header + (f", Token={token}" if token else "")
cmd = f"curl -sf -X {method} 'http://localhost:8096{path}'"
if data_file:
cmd += f" -d '@{data_file}' -H 'Content-Type:application/json'"
elif data:
payload = json.dumps(data) if isinstance(data, dict) else data
cmd += f" -d '{payload}' -H 'Content-Type:application/json'"
cmd += f" -H 'X-Emby-Authorization:{hdr}'"
return machine.succeed(cmd)
def setup_jellyfin(machine, retry, auth_header, auth_payload, empty_payload):
machine.wait_for_unit("jellyfin.service")
machine.wait_for_open_port(8096)
machine.wait_until_succeeds(
"curl -sf http://localhost:8096/health | grep -q Healthy", timeout=120
)
machine.wait_until_succeeds(
f"curl -sf 'http://localhost:8096/Startup/Configuration' "
f"-H 'X-Emby-Authorization:{auth_header}'"
)
jellyfin_api(machine, "GET", "/Startup/FirstUser", auth_header)
jellyfin_api(machine, "POST", "/Startup/Complete", auth_header)
result = json.loads(
jellyfin_api(
machine, "POST", "/Users/AuthenticateByName",
auth_header, data_file=auth_payload,
)
)
token = result["AccessToken"]
user_id = result["User"]["Id"]
tempdir = machine.succeed("mktemp -d -p /var/lib/jellyfin").strip()
machine.succeed(f"chmod 755 '{tempdir}'")
machine.succeed(
f"ffmpeg -f lavfi -i testsrc2=duration=5 -f lavfi -i sine=frequency=440:duration=5 "
f"-c:v libx264 -c:a aac '{tempdir}/Test Movie (2024).mkv'"
)
query = urlencode({
"name": "Test Library",
"collectionType": "Movies",
"paths": tempdir,
"refreshLibrary": "true",
})
jellyfin_api(
machine, "POST", f"/Library/VirtualFolders?{query}",
auth_header, token=token, data_file=empty_payload,
)
def is_ready(_):
folders = json.loads(
jellyfin_api(machine, "GET", "/Library/VirtualFolders", auth_header, token=token)
)
return all(f.get("RefreshStatus") == "Idle" for f in folders)
retry(is_ready, timeout=60)
movie_id = None
media_source_id = None
def get_movie(_):
nonlocal movie_id, media_source_id
items = json.loads(
jellyfin_api(
machine, "GET",
f"/Users/{user_id}/Items?IncludeItemTypes=Movie&Recursive=true",
auth_header, token=token,
)
)
if items["TotalRecordCount"] > 0:
movie_id = items["Items"][0]["Id"]
info = json.loads(
jellyfin_api(
machine, "GET", f"/Users/{user_id}/Items/{movie_id}",
auth_header, token=token,
)
)
media_source_id = info["MediaSources"][0]["Id"]
return True
return False
retry(get_movie, timeout=60)
return token, user_id, movie_id, media_source_id

View File

@@ -0,0 +1,58 @@
import http.server, json, sys
PORT = int(sys.argv[1])
DATA_FILE = sys.argv[2]
class Handler(http.server.BaseHTTPRequestHandler):
def log_message(self, fmt, *args):
pass
def _read_body(self):
length = int(self.headers.get("Content-Length", 0))
return json.loads(self.rfile.read(length)) if length else {}
def _json(self, code, body):
data = json.dumps(body).encode()
self.send_response(code)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(data)
def do_POST(self):
if self.path == "/api/annotations":
body = self._read_body()
try:
with open(DATA_FILE) as f:
annotations = json.load(f)
except Exception:
annotations = []
aid = len(annotations) + 1
body["id"] = aid
annotations.append(body)
with open(DATA_FILE, "w") as f:
json.dump(annotations, f)
self._json(200, {"id": aid, "message": "Annotation added"})
else:
self.send_response(404)
self.end_headers()
def do_PATCH(self):
if self.path.startswith("/api/annotations/"):
aid = int(self.path.rsplit("/", 1)[-1])
body = self._read_body()
try:
with open(DATA_FILE) as f:
annotations = json.load(f)
except Exception:
annotations = []
for a in annotations:
if a["id"] == aid:
a.update(body)
with open(DATA_FILE, "w") as f:
json.dump(annotations, f)
self._json(200, {"message": "Annotation patched"})
else:
self.send_response(404)
self.end_headers()
http.server.HTTPServer(("127.0.0.1", PORT), Handler).serve_forever()

View File

@@ -22,9 +22,20 @@ in
fail2banImmichTest = handleTest ./fail2ban-immich.nix;
fail2banJellyfinTest = handleTest ./fail2ban-jellyfin.nix;
# jellyfin annotation service test
jellyfinAnnotationsTest = handleTest ./jellyfin-annotations.nix;
# zfs scrub annotations test
zfsScrubAnnotationsTest = handleTest ./zfs-scrub-annotations.nix;
# xmrig auto-pause test
xmrigAutoPauseTest = handleTest ./xmrig-auto-pause.nix;
# ntfy alerts test
ntfyAlertsTest = handleTest ./ntfy-alerts.nix;
# torrent audit test
torrentAuditTest = handleTest ./torrent-audit.nix;
# gitea runner test
giteaRunnerTest = handleTest ./gitea-runner.nix;
}

206
tests/xmrig-auto-pause.nix Normal file
View File

@@ -0,0 +1,206 @@
{
pkgs,
...
}:
let
script = ../services/monero/xmrig-auto-pause.py;
python = pkgs.python3;
in
pkgs.testers.runNixOSTest {
name = "xmrig-auto-pause";
nodes.machine =
{ pkgs, ... }:
{
environment.systemPackages = [
pkgs.python3
pkgs.procps
];
# Mock xmrig as a nice'd sleep process that can be stopped/started.
systemd.services.xmrig = {
description = "Mock xmrig miner";
serviceConfig = {
ExecStart = "${pkgs.coreutils}/bin/sleep infinity";
Type = "simple";
Nice = 19;
};
wantedBy = [ "multi-user.target" ];
};
};
testScript = ''
import time
PYTHON = "${python}/bin/python3"
SCRIPT = "${script}"
# Tuned for test VMs (1-2 cores).
# POLL_INTERVAL=1 keeps detection latency low.
# GRACE_PERIOD=5 is long enough to verify "stays stopped" but short
# enough that the full test completes in reasonable time.
# CPU_STOP_THRESHOLD=20 catches a busy-loop on a 1-2 core VM (50-100%)
# without triggering from normal VM noise.
# CPU_RESUME_THRESHOLD=10 is the idle cutoff for a 1-2 core VM.
POLL_INTERVAL = "1"
GRACE_PERIOD = "5"
CPU_STOP_THRESHOLD = "20"
CPU_RESUME_THRESHOLD = "10"
STARTUP_COOLDOWN = "4"
STATE_DIR = "/tmp/xap-state"
def start_cpu_load(name):
"""Start a non-nice CPU burn as a transient systemd unit."""
machine.succeed(
f"systemd-run --unit={name} --property=Type=exec "
f"bash -c 'while true; do :; done'"
)
def stop_cpu_load(name):
machine.succeed(f"systemctl stop {name}")
def start_monitor(unit_name):
"""Start the auto-pause monitor as a transient unit."""
machine.succeed(
f"systemd-run --unit={unit_name} "
f"--setenv=POLL_INTERVAL={POLL_INTERVAL} "
f"--setenv=GRACE_PERIOD={GRACE_PERIOD} "
f"--setenv=CPU_STOP_THRESHOLD={CPU_STOP_THRESHOLD} "
f"--setenv=CPU_RESUME_THRESHOLD={CPU_RESUME_THRESHOLD} "
f"--setenv=STARTUP_COOLDOWN={STARTUP_COOLDOWN} "
f"--setenv=STATE_DIR={STATE_DIR} "
f"{PYTHON} {SCRIPT}"
)
# Monitor needs two consecutive polls to compute a CPU delta.
time.sleep(3)
# Monitor needs two consecutive polls to compute a CPU delta.
time.sleep(3)
start_all()
machine.wait_for_unit("multi-user.target")
machine.wait_for_unit("xmrig.service")
machine.succeed(f"mkdir -p {STATE_DIR}")
with subtest("Start auto-pause monitor"):
start_monitor("xmrig-auto-pause")
with subtest("xmrig stays running while system is idle"):
machine.succeed("systemctl is-active xmrig")
with subtest("xmrig stopped when CPU load appears"):
start_cpu_load("cpu-load")
machine.wait_until_fails("systemctl is-active xmrig", timeout=20)
with subtest("xmrig remains stopped during grace period after load ends"):
stop_cpu_load("cpu-load")
# Load just stopped. Grace period is 5s. Check at 2s well within.
time.sleep(2)
machine.fail("systemctl is-active xmrig")
with subtest("xmrig resumes after grace period expires"):
# Already idle since previous subtest. Grace period (5s) plus
# detection delay (~2 polls) plus startup cooldown (4s) means
# xmrig should restart within ~12s.
machine.wait_until_succeeds("systemctl is-active xmrig", timeout=20)
with subtest("Intermittent load does not cause flapping"):
# First load stop xmrig
start_cpu_load("cpu-load-1")
machine.wait_until_fails("systemctl is-active xmrig", timeout=20)
stop_cpu_load("cpu-load-1")
# Brief idle gap shorter than grace period
time.sleep(2)
# Second load arrives before grace period expires
start_cpu_load("cpu-load-2")
time.sleep(3)
# xmrig must still be stopped
machine.fail("systemctl is-active xmrig")
stop_cpu_load("cpu-load-2")
machine.wait_until_succeeds("systemctl is-active xmrig", timeout=20)
with subtest("Sustained load keeps xmrig stopped"):
start_cpu_load("cpu-load-3")
machine.wait_until_fails("systemctl is-active xmrig", timeout=20)
# Stay busy longer than the grace period to prove continuous
# activity keeps xmrig stopped indefinitely.
time.sleep(8)
machine.fail("systemctl is-active xmrig")
stop_cpu_load("cpu-load-3")
machine.wait_until_succeeds("systemctl is-active xmrig", timeout=20)
with subtest("External restart detected and re-stopped under load"):
# Put system under load so auto-pause stops xmrig.
start_cpu_load("cpu-load-4")
machine.wait_until_fails("systemctl is-active xmrig", timeout=20)
# Something external starts xmrig while load is active.
# The script should detect this and re-stop it.
machine.succeed("systemctl start xmrig")
machine.succeed("systemctl is-active xmrig")
machine.wait_until_fails("systemctl is-active xmrig", timeout=20)
stop_cpu_load("cpu-load-4")
machine.wait_until_succeeds("systemctl is-active xmrig", timeout=20)
# --- State persistence and crash recovery ---
machine.succeed("systemctl stop xmrig-auto-pause")
with subtest("xmrig recovers after crash during startup cooldown"):
machine.succeed(f"rm -rf {STATE_DIR} && mkdir -p {STATE_DIR}")
start_monitor("xmrig-auto-pause-crash")
# Load -> xmrig stops
start_cpu_load("cpu-crash")
machine.wait_until_fails("systemctl is-active xmrig", timeout=20)
# End load -> xmrig restarts after grace period
stop_cpu_load("cpu-crash")
machine.wait_until_succeeds("systemctl is-active xmrig", timeout=30)
# Kill xmrig immediately simulates crash during startup cooldown.
# The script should detect the failure when cooldown expires and
# re-enter the retry cycle.
machine.succeed("systemctl kill --signal=KILL xmrig")
machine.wait_until_fails("systemctl is-active xmrig", timeout=5)
# After cooldown + grace period + restart, xmrig should be back.
machine.wait_until_succeeds("systemctl is-active xmrig", timeout=30)
machine.succeed("systemctl stop xmrig-auto-pause-crash")
machine.succeed("systemctl reset-failed xmrig.service || true")
machine.succeed("systemctl start xmrig")
machine.wait_for_unit("xmrig.service")
with subtest("Script restart preserves pause state"):
machine.succeed(f"rm -rf {STATE_DIR} && mkdir -p {STATE_DIR}")
start_monitor("xmrig-auto-pause-persist")
# Load -> xmrig stops
start_cpu_load("cpu-persist")
machine.wait_until_fails("systemctl is-active xmrig", timeout=20)
# Kill the monitor while xmrig is paused (simulates script crash)
machine.succeed("systemctl stop xmrig-auto-pause-persist")
# State file must exist the monitor persisted the pause flag
machine.succeed(f"test -f {STATE_DIR}/paused")
# Start a fresh monitor instance (reads state file on startup)
start_monitor("xmrig-auto-pause-persist2")
# End load the new monitor should pick up the paused state
# and restart xmrig after the grace period
stop_cpu_load("cpu-persist")
machine.wait_until_succeeds("systemctl is-active xmrig", timeout=30)
# State file should be cleaned up after successful restart
machine.fail(f"test -f {STATE_DIR}/paused")
machine.succeed("systemctl stop xmrig-auto-pause-persist2")
'';
}

View File

@@ -0,0 +1,123 @@
{
lib,
pkgs,
...
}:
let
mockServer = ./mock-grafana-server.py;
mockZpool = pkgs.writeShellScript "zpool" ''
case "$1" in
list)
echo "tank"
echo "hdds"
;;
status)
pool="$2"
if [ "$pool" = "tank" ]; then
echo " scan: scrub repaired 0B in 00:24:39 with 0 errors on Mon Jan 1 02:24:39 2024"
elif [ "$pool" = "hdds" ]; then
echo " scan: scrub repaired 0B in 04:12:33 with 0 errors on Mon Jan 1 06:12:33 2024"
fi
;;
esac
'';
script = ../services/grafana/zfs-scrub-annotations.sh;
python = pkgs.python3;
in
pkgs.testers.runNixOSTest {
name = "zfs-scrub-annotations";
nodes.machine =
{ pkgs, ... }:
{
environment.systemPackages = with pkgs; [
python3
curl
jq
];
};
testScript = ''
import json
GRAFANA_PORT = 13000
ANNOTS_FILE = "/tmp/annotations.json"
STATE_DIR = "/tmp/scrub-state"
PYTHON = "${python}/bin/python3"
MOCK = "${mockServer}"
SCRIPT = "${script}"
MOCK_ZPOOL = "${mockZpool}"
MOCK_BIN = "/tmp/mock-bin"
ENV_PREFIX = (
f"GRAFANA_URL=http://127.0.0.1:{GRAFANA_PORT} "
f"STATE_DIR={STATE_DIR} "
f"PATH={MOCK_BIN}:$PATH "
)
def read_annotations():
out = machine.succeed(f"cat {ANNOTS_FILE} 2>/dev/null || echo '[]'")
return json.loads(out.strip())
start_all()
machine.wait_for_unit("multi-user.target")
with subtest("Setup state directory and mock zpool"):
machine.succeed(f"mkdir -p {STATE_DIR}")
machine.succeed(f"mkdir -p {MOCK_BIN} && cp {MOCK_ZPOOL} {MOCK_BIN}/zpool && chmod +x {MOCK_BIN}/zpool")
with subtest("Start mock Grafana server"):
machine.succeed(f"echo '[]' > {ANNOTS_FILE}")
machine.succeed(
f"systemd-run --unit=mock-grafana {PYTHON} {MOCK} {GRAFANA_PORT} {ANNOTS_FILE}"
)
machine.wait_until_succeeds(
f"curl -sf -X POST http://127.0.0.1:{GRAFANA_PORT}/api/annotations "
f"-H 'Content-Type: application/json' -d '{{\"text\":\"ping\",\"tags\":[]}}' | grep -q id",
timeout=10,
)
machine.succeed(f"echo '[]' > {ANNOTS_FILE}")
with subtest("Start action creates annotation with pool names and zfs-scrub tag"):
machine.succeed(f"{ENV_PREFIX} bash {SCRIPT} start")
annots = read_annotations()
assert len(annots) == 1, f"Expected 1 annotation, got: {annots}"
assert "zfs-scrub" in annots[0].get("tags", []), f"Missing zfs-scrub tag: {annots[0]}"
assert "tank" in annots[0]["text"], f"Missing tank in text: {annots[0]['text']}"
assert "hdds" in annots[0]["text"], f"Missing hdds in text: {annots[0]['text']}"
assert "time" in annots[0], f"Missing time field: {annots[0]}"
assert "timeEnd" not in annots[0], f"timeEnd should not be set yet: {annots[0]}"
with subtest("State file contains annotation ID"):
ann_id = machine.succeed(f"cat {STATE_DIR}/annotation-id").strip()
assert ann_id == "1", f"Expected annotation ID 1, got: {ann_id}"
with subtest("Stop action closes annotation with per-pool scrub results"):
machine.succeed(f"{ENV_PREFIX} bash {SCRIPT} stop")
annots = read_annotations()
assert len(annots) == 1, f"Expected 1 annotation, got: {annots}"
assert "timeEnd" in annots[0], f"timeEnd should be set: {annots[0]}"
assert annots[0]["timeEnd"] > annots[0]["time"], "timeEnd should be after time"
text = annots[0]["text"]
assert "ZFS scrub completed" in text, f"Missing completed text: {text}"
assert "tank:" in text, f"Missing tank results: {text}"
assert "hdds:" in text, f"Missing hdds results: {text}"
assert "00:24:39" in text, f"Missing tank scrub duration: {text}"
assert "04:12:33" in text, f"Missing hdds scrub duration: {text}"
with subtest("State file cleaned up after stop"):
machine.fail(f"test -f {STATE_DIR}/annotation-id")
with subtest("Stop action handles missing state file gracefully"):
machine.succeed(f"{ENV_PREFIX} bash {SCRIPT} stop")
annots = read_annotations()
assert len(annots) == 1, f"Expected no new annotations, got: {annots}"
with subtest("Start action handles Grafana being down gracefully"):
machine.succeed("systemctl stop mock-grafana")
machine.succeed(f"{ENV_PREFIX} bash {SCRIPT} start")
machine.fail(f"test -f {STATE_DIR}/annotation-id")
'';
}