53 Commits

Author SHA1 Message Date
primary
96a0162b4e age-secrets: add git-crypt-key-nixos (pre-unify cutover)
Additive. The new unified nixos repo (projects/nixos/) uses a fresh git-crypt
key so we can retire the two per-repo keys later. Deploying this change alone
makes /run/agenix/git-crypt-key-nixos available on muffin; the nixos CI's
git-crypt unlock step can then succeed once the new repo lands on Gitea.
2026-04-18 01:19:17 -04:00
9ea45d4558 hardware: tighten mq-deadline read_expire for jellyfin coexistence 2026-04-17 19:47:20 -04:00
aecd9002b0 zfs tuning 2026-04-15 18:25:56 -04:00
a0085187a9 fix systemd-tmpfiles
All checks were successful
Build and Deploy / deploy (push) Successful in 3m14s
2026-04-14 21:59:08 -04:00
f28dd190bf move off of hardened kernel to latest LTS 2026-04-14 20:04:26 -04:00
100999734b ddns-updater: disable DynamicUser to fix secret perms
Some checks failed
Build and Deploy / deploy (push) Failing after 10s
2026-04-09 20:47:04 -04:00
ce1c335230 caddy: wildcard TLS via DNS-01 challenge + ddns-updater for Njalla
Some checks failed
Build and Deploy / deploy (push) Failing after 31m3s
Build Caddy with the caddy-dns/njalla plugin to enable DNS-01 ACME
challenges. This issues a single wildcard certificate for
*.sigkill.computer instead of per-subdomain certificates, reducing
Let's Encrypt API calls and certificate management overhead.

Add ddns-updater service (nixpkgs services.ddns-updater) configured
with Njalla provider to automatically update DNS records when the
server's public IP changes.
2026-04-09 19:54:57 -04:00
a3a6700106 grafana: replace disk-usage-collector with prometheus-zfs-exporter
The custom disk-usage-collector shell script + minutely timer is replaced
by prometheus-zfs-exporter (pdf/zfs_exporter, packaged in nixpkgs as
services.prometheus.exporters.zfs). The exporter provides pool capacity
metrics (allocated/free/size) natively.

Partition metrics (/boot, /persistent, /nix) now use node_exporter's
built-in filesystem collector (node_filesystem_*_bytes) which already
runs and collects these metrics.

Also fixes a latent race condition in serviceMountWithZpool: the -mounts
service now orders after zfs-mount.service (which runs 'zfs mount -a'),
not just after pool import. Without this, the mount check could run
before datasets are actually mounted.
2026-04-09 19:54:57 -04:00
75319256f3 lib: add mkCaddyReverseProxy, mkFail2banJail, mkGrafanaAnnotationService, extractArrApiKey 2026-04-09 19:54:57 -04:00
a5c7c91e38 Power: disable a bunch of things
All checks were successful
Build and Deploy / deploy (push) Successful in 1m42s
BROKE intel arc A380 completely because it was forced into L1.1/L1.2
pcie substates. Forcewaking the device would fail and it would never come up.

So I will be more conservative on power saving tuning.
2026-04-07 19:08:08 -04:00
628c16fe64 fix git-crypt key for dotfiles workflow
All checks were successful
Build and Deploy / deploy (push) Successful in 2m32s
2026-04-07 13:51:19 -04:00
a76a7969d9 nix-cache
Some checks failed
Build and Deploy / deploy (push) Failing after 1h17m39s
2026-04-06 14:21:31 -04:00
3b8aedd502 fix hardened kernel with nix sandbox 2026-04-06 13:36:38 -04:00
3f62b9c88e grafana: replace custom metric collectors with community exporters
Replace three custom Prometheus textfile collector scripts with
dedicated community-maintained exporters:

- jellyfin-collector.nix (25 LoC shell) -> rebelcore/jellyfin_exporter
  Metric: jellyfin_active_streams -> count(jellyfin_now_playing_state)
  Bonus: per-session labels (user, title, device, codec info)

- qbittorrent-collector.nix (40 LoC shell) -> anriha/qbittorrent-metrics-exporter
  Metric: qbittorrent_{download,upload}_bytes_per_second -> qbit_{dl,up}speed
  Bonus: per-torrent metrics with category/tag aggregation

- intel-gpu-collector.nix + .py (130 LoC Python) -> mike1808/igpu-exporter
  Metric: intel_gpu_engine_busy_percent -> igpu_engines_busy_percent
  Bonus: persistent daemon vs oneshot timer, no streaming JSON parser

All three run as persistent daemons scraped by Prometheus, replacing
the textfile-collector pattern of systemd timers writing .prom files.
Dashboard PromQL queries updated to match new metric names.
2026-04-03 15:38:13 -04:00
37ac88fc0f lib: replace deprecated overrideDerivation with overrideAttrs
overrideDerivation has been deprecated since 2019. The new
overrideAttrs properly handles the env attribute set used by
modern derivations to avoid the NIX_CFLAGS_COMPILE overlap
error between env and top-level derivation arguments.
2026-04-03 15:18:22 -04:00
0aeb6c5523 llama-cpp: add API key auth via --api-key-file
Some checks failed
Build and Deploy / deploy (push) Failing after 2m49s
Generate and encrypt a Bearer token for llama-cpp's built-in auth.
Remove caddy_auth from the vhost since basic auth blocks Bearer-only
clients. Internal sidecars (xmrig-pause, annotations) connect
directly to localhost and are unaffected (/slots is public).
2026-04-02 18:02:23 -04:00
7e779ca0f7 power optimizations 2026-04-02 13:13:38 -04:00
5375f8ee34 gitea: add actions runner and CI/CD deploy workflow
This will avoid me having to run "deploy" myself on my laptop.
All I will need to do is push a commit and it will self-deploy.
2026-03-31 12:38:43 -04:00
e4feaa35ad secrets: migrate build-time secrets to agenix runtime
- coturn: switch static-auth-secret to static-auth-secret-file
- matrix: switch registration_token and turn_secret to file-based
- murmur: switch password to environmentFile with agenix
- p2pool: move public wallet address to service-configs.nix
2026-03-31 12:38:43 -04:00
eaeeed7f45 fix mq-deadline for hdds: 3 2026-03-31 12:38:42 -04:00
7de24b8870 fix mq-deadline for hdds: 2 2026-03-30 15:37:46 -04:00
eeab5de886 fix mq-deadline for hdds 2026-03-30 13:21:33 -04:00
9392749e66 mollysocket: init
Add mollysocket so we can use ntfy for molly (signal)
2026-03-30 13:05:22 -04:00
834f28f898 secureboot: cleanup script permissions 2026-03-28 04:15:26 -07:00
2409d1b01b zfs: tune hdds pool 2026-03-28 01:21:48 -07:00
fd3df23a76 firefox-syncserver: init 2026-03-21 10:26:28 -04:00
3b23aea374 monero+p2pool: move to ssds
I tried running these on my hdd array because I have more storage there
but it is WAY too slow. So I need to have it on the ssds instead, as much
as it pains me to use my valuable ssd space.
2026-03-20 14:04:15 -04:00
c008fd2b18 zfs: don't specify zfs arc cache
Turns out, zfs is smart!
ZFS already has sane defaults, no sense in limiting the size of the cache.
2026-03-06 14:11:14 -05:00
3ccce88040 zfs: remove unneeded options 2026-03-06 13:47:06 -05:00
ad4d2d41fb zfs: tweak arc settings 2026-03-06 13:44:55 -05:00
f784f26848 monero: changes 2026-03-04 18:56:55 -05:00
b5be21ff8c secrets: cleanup activation scripts 2026-03-04 17:35:49 -05:00
d4b679d1a5 cleanup 2026-03-03 19:39:10 -05:00
cdccab855d zfs: zfs_txg_timeout 30 -> 120 2026-03-03 15:06:13 -05:00
ce4d1c0ef2 zfs: tuning 2026-03-03 14:31:42 -05:00
b977b578e0 arr-init: extract to standalone flake repo 2026-03-03 14:31:39 -05:00
dc9d58a543 ntfy-alerts: suppress notifications for sanoid 2026-03-03 14:31:38 -05:00
39a76a3265 zfs: fix sanoid dataset name for jellyfin cache 2026-03-03 14:31:37 -05:00
294cb6453e ntfy-alerts: init 2026-03-03 14:31:36 -05:00
745d0ea4c2 arr-init: add module for API-based configuration 2026-03-03 14:31:28 -05:00
fb305cc9f4 fmt 2026-03-03 14:31:20 -05:00
a9e8ce09d1 fix(no-rgb): handle transient hardware unavailability during deploy 2026-03-03 14:31:19 -05:00
0d1205210d feat(tmpfiles): defer per-service file permissions to reduce boot time 2026-03-03 14:31:18 -05:00
1db214aee5 impermanence: fix /etc permissions after re-deploy 2026-03-03 14:31:17 -05:00
12b681c8f2 cleanup 2026-03-03 14:31:05 -05:00
bd0c7cde6d tests: fix all fail2ban NixOS VM tests
- Add explicit iptables banaction in security.nix for test compatibility
- Force IPv4 in all curl requests to prevent IPv4/IPv6 mismatch issues
- Fix caddy test: use basic_auth directive (not basicauth)
- Override service ports in tests to match direct connections (not via Caddy)
- Vaultwarden: override ROCKET_ADDRESS and ROCKET_LOG for external access
- Immich: increase VM memory to 4GB for stability
- Jellyfin: create placeholder log file and reload fail2ban after startup
- Add tests.nix entries for all 6 fail2ban tests

All tests now pass: ssh, caddy, gitea, vaultwarden, immich, jellyfin
2026-03-03 14:30:59 -05:00
0e1aa6fe0e nit: move fail2ban to security module 2026-03-03 14:30:56 -05:00
3db2728dbe security things 2026-03-03 14:30:54 -05:00
5fe233e05e impermanence: fix /etc/zfs cache 2026-03-03 14:30:51 -05:00
65b49488d1 impermanence: fix persistant ssh host keys 2026-03-03 14:30:51 -05:00