Commit Graph

814 Commits

Author SHA1 Message Date
acfa08fc2e traccar: init 2026-04-12 21:04:16 -04:00
1f2886d35c AGENTS.md: document postgresql-first policy 2026-04-12 21:04:08 -04:00
674d3cf539 fix tests 2026-04-12 15:36:04 -04:00
bef4ac7ddc update
Some checks failed
Build and Deploy / deploy (push) Failing after 10m32s
2026-04-11 10:28:01 -04:00
12469de580 llama.cpp: things 2026-04-11 10:27:38 -04:00
dad3867144 grafana: fix llama-cpp annotation query format for Grafana 12
All checks were successful
Build and Deploy / deploy (push) Successful in 2m42s
Grafana 12 expects Prometheus annotation queries wrapped in a 'target'
object with datasource, expr, refId, and range fields. The previous
format had expr/step as top-level fields which Grafana silently ignored.
2026-04-09 22:19:21 -04:00
7ee55eca6b typo: systemd.service -> systemd.services
Some checks failed
Build and Deploy / deploy (push) Failing after 15m58s
2026-04-09 20:48:06 -04:00
100999734b ddns-updater: disable DynamicUser to fix secret perms
Some checks failed
Build and Deploy / deploy (push) Failing after 10s
2026-04-09 20:47:04 -04:00
ce1c335230 caddy: wildcard TLS via DNS-01 challenge + ddns-updater for Njalla
Some checks failed
Build and Deploy / deploy (push) Failing after 31m3s
Build Caddy with the caddy-dns/njalla plugin to enable DNS-01 ACME
challenges. This issues a single wildcard certificate for
*.sigkill.computer instead of per-subdomain certificates, reducing
Let's Encrypt API calls and certificate management overhead.

Add ddns-updater service (nixpkgs services.ddns-updater) configured
with Njalla provider to automatically update DNS records when the
server's public IP changes.
2026-04-09 19:54:57 -04:00
e9ce1ce0a2 grafana: replace llama-cpp-annotations daemon with prometheus query 2026-04-09 19:54:57 -04:00
a3a6700106 grafana: replace disk-usage-collector with prometheus-zfs-exporter
The custom disk-usage-collector shell script + minutely timer is replaced
by prometheus-zfs-exporter (pdf/zfs_exporter, packaged in nixpkgs as
services.prometheus.exporters.zfs). The exporter provides pool capacity
metrics (allocated/free/size) natively.

Partition metrics (/boot, /persistent, /nix) now use node_exporter's
built-in filesystem collector (node_filesystem_*_bytes) which already
runs and collects these metrics.

Also fixes a latent race condition in serviceMountWithZpool: the -mounts
service now orders after zfs-mount.service (which runs 'zfs mount -a'),
not just after pool import. Without this, the mount check could run
before datasets are actually mounted.
2026-04-09 19:54:57 -04:00
75319256f3 lib: add mkCaddyReverseProxy, mkFail2banJail, mkGrafanaAnnotationService, extractArrApiKey 2026-04-09 19:54:57 -04:00
c74d356595 xmrig: compile with compiler optimizations
All checks were successful
Build and Deploy / deploy (push) Successful in 2m45s
2026-04-09 16:25:30 -04:00
ae03c2f288 p2pool: don't disable on power loss
p2pool is very light on resources, it's xmrig that should be disabled
2026-04-09 14:44:13 -04:00
0d87f90657 gitea: make gitea-runner wait for gitea.service
Some checks failed
Build and Deploy / deploy (push) Failing after 4m18s
prevents spam on ntfy
2026-04-09 14:16:05 -04:00
d1e9c92423 update
Some checks failed
Build and Deploy / deploy (push) Failing after 4s
2026-04-09 14:03:34 -04:00
4f33b16411 llama.cpp: thing 2026-04-09 14:02:53 -04:00
4f41789995 Reapply "llama-cpp: enable"
All checks were successful
Build and Deploy / deploy (push) Successful in 6m43s
This reverts commit 645a532ed7.
2026-04-07 22:49:53 -04:00
c0390af1a4 llama-cpp: update
All checks were successful
Build and Deploy / deploy (push) Successful in 2m33s
2026-04-07 22:29:02 -04:00
98310f2582 organize patches + add gemma4 patch
All checks were successful
Build and Deploy / deploy (push) Successful in 2m41s
2026-04-07 20:57:54 -04:00
645a532ed7 Revert "llama-cpp: enable"
All checks were successful
Build and Deploy / deploy (push) Successful in 1m52s
This reverts commit fdc1596bce.
2026-04-07 20:23:48 -04:00
2884a39eb1 llama-cpp: patch for vulkan support instead
All checks were successful
Build and Deploy / deploy (push) Successful in 7m23s
2026-04-07 20:07:02 -04:00
fdc1596bce llama-cpp: enable
All checks were successful
Build and Deploy / deploy (push) Successful in 7m16s
2026-04-07 19:15:56 -04:00
778b04a80f Reapply "llama-cpp: maybe use vulkan?"
All checks were successful
Build and Deploy / deploy (push) Successful in 2m17s
This reverts commit 9addb1569a.
2026-04-07 19:12:57 -04:00
88fc219f2d update 2026-04-07 19:11:50 -04:00
a5c7c91e38 Power: disable a bunch of things
All checks were successful
Build and Deploy / deploy (push) Successful in 1m42s
BROKE intel arc A380 completely because it was forced into L1.1/L1.2
pcie substates. Forcewaking the device would fail and it would never come up.

So I will be more conservative on power saving tuning.
2026-04-07 19:08:08 -04:00
628c16fe64 fix git-crypt key for dotfiles workflow
All checks were successful
Build and Deploy / deploy (push) Successful in 2m32s
2026-04-07 13:51:19 -04:00
0df5d98770 grafana: use postgresql
All checks were successful
Build and Deploy / deploy (push) Successful in 2m45s
Doesn't use for data, only annotation and other stuff
2026-04-07 12:44:59 -04:00
2848c7e897 grafana: keep data forever 2026-04-07 12:44:46 -04:00
e57c9cb83b xmrig-auto-pause: raise thresholds for server background load
All checks were successful
Build and Deploy / deploy (push) Successful in 1m59s
2026-04-07 01:09:16 -04:00
d48f27701f xmrig-auto-pause: add hysteresis to prevent stop/start thrashing
xmrig's RandomX pollutes the L3 cache, making other processes appear
~3-8% busier. With a single 5% threshold for both stopping and
resuming, the script oscillates: start xmrig -> cache pressure
inflates CPU -> stop xmrig -> CPU drops -> restart -> repeat.

Split into CPU_STOP_THRESHOLD (15%) and CPU_RESUME_THRESHOLD (5%).
The stop threshold sits above xmrig's indirect pressure, so only
genuine workloads trigger a pause. The resume threshold confirms the
system is truly idle before restarting.
2026-04-07 01:09:06 -04:00
738861fd53 lanzaboote: fix was upstreamed 2026-04-06 19:21:20 -04:00
274ef40ccc lanzaboote: pin to fork with pcrlock reinstall fix
Some checks failed
Build and Deploy / deploy (push) Failing after 3h15m29s
Upstream PR: https://github.com/nix-community/lanzaboote/pull/566
2026-04-06 16:08:57 -04:00
a76a7969d9 nix-cache
Some checks failed
Build and Deploy / deploy (push) Failing after 1h17m39s
2026-04-06 14:21:31 -04:00
4be2eaed35 Reapply "update"
Some checks failed
Build and Deploy / deploy (push) Failing after 10m49s
This reverts commit 655bbda26f.
2026-04-06 13:40:52 -04:00
655bbda26f Revert "update"
All checks were successful
Build and Deploy / deploy (push) Successful in 1m19s
This reverts commit 960259b0d0.
2026-04-06 13:39:32 -04:00
3b8aedd502 fix hardened kernel with nix sandbox 2026-04-06 13:36:38 -04:00
960259b0d0 update
Some checks failed
Build and Deploy / deploy (push) Failing after 2m14s
2026-04-06 13:12:50 -04:00
5fa6f37b28 llama-cpp: disable 2026-04-06 13:12:06 -04:00
7afd1f35d2 xmrig-auto-pause: fix 2026-04-06 13:11:54 -04:00
a12dcb01ec llama-cpp: remove folder 2026-04-06 12:48:28 -04:00
6d47f02a0f llama-cpp: set batch size to 4096
All checks were successful
Build and Deploy / deploy (push) Successful in 1m22s
2026-04-06 02:29:37 -04:00
9addb1569a Revert "llama-cpp: maybe use vulkan?"
This reverts commit 0a927ea893.
2026-04-06 02:28:26 -04:00
df04e36b41 llama-cpp: fix vulkan cache
Some checks failed
Build and Deploy / deploy (push) Failing after 1m23s
2026-04-06 02:23:29 -04:00
0a927ea893 llama-cpp: maybe use vulkan?
All checks were successful
Build and Deploy / deploy (push) Successful in 8m30s
2026-04-06 02:12:46 -04:00
3e46c5bfa5 llama-cpp: use turbo3 for everything
All checks were successful
Build and Deploy / deploy (push) Successful in 1m20s
2026-04-06 01:53:11 -04:00
06aee5af77 llama-cpp: gemma 4 E4B -> gemma 4 E2B
All checks were successful
Build and Deploy / deploy (push) Successful in 2m5s
2026-04-06 01:24:25 -04:00
8fddd3a954 llama-cpp: context: 32768 -> 65536
All checks were successful
Build and Deploy / deploy (push) Successful in 2m58s
2026-04-06 01:04:23 -04:00
0e4f0d3176 llama-cpp: fix model name
All checks were successful
Build and Deploy / deploy (push) Successful in 1m18s
2026-04-06 00:59:20 -04:00
bbcd662c28 xmrig-auto-pause: fix stuck state after external restart, add startup cooldown
All checks were successful
Build and Deploy / deploy (push) Successful in 8m47s
Two bugs found during live verification on the server:

1. Stuck state after external restart: if something else restarted xmrig
   (e.g. deploy-rs activation) while paused_by_us=True, the script never
   detected this and became permanently stuck — unable to stop xmrig on
   future load because it thought xmrig was already stopped.

   Fix: when paused_by_us=True and busy, check if xmrig is actually
   running. If so, reset paused_by_us=False and re-stop it.

2. Flapping on xmrig restart: RandomX dataset init takes ~3.7s of intense
   non-nice CPU, which the script detected as real workload and immediately
   re-stopped xmrig after every restart, creating a start-stop loop.

   Fix: add STARTUP_COOLDOWN (default 10s) — after starting xmrig, skip
   CPU checks until the cooldown expires.

Both bugs were present in production: the script had been stuck since
Apr 3 (2+ days) with xmrig running unmanaged alongside llama-server.
2026-04-05 23:20:47 -04:00