Two bugs found during live verification on the server:
1. Stuck state after external restart: if something else restarted xmrig
(e.g. deploy-rs activation) while paused_by_us=True, the script never
detected this and became permanently stuck — unable to stop xmrig on
future load because it thought xmrig was already stopped.
Fix: when paused_by_us=True and busy, check if xmrig is actually
running. If so, reset paused_by_us=False and re-stop it.
2. Flapping on xmrig restart: RandomX dataset init takes ~3.7s of intense
non-nice CPU, which the script detected as real workload and immediately
re-stopped xmrig after every restart, creating a start-stop loop.
Fix: add STARTUP_COOLDOWN (default 10s) — after starting xmrig, skip
CPU checks until the cooldown expires.
Both bugs were present in production: the script had been stuck since
Apr 3 (2+ days) with xmrig running unmanaged alongside llama-server.
llama.cpp server has a built-in /metrics endpoint exposing
prompt_tokens_seconds, predicted_tokens_seconds, tokens_predicted_total,
n_decode_total, and n_busy_slots_per_decode. Enable it with --metrics
and add a Prometheus scrape target, replacing the need for any external
metric collection for LLM inference monitoring.
overrideDerivation has been deprecated since 2019. The new
overrideAttrs properly handles the env attribute set used by
modern derivations to avoid the NIX_CFLAGS_COMPILE overlap
error between env and top-level derivation arguments.
Generate and encrypt a Bearer token for llama-cpp's built-in auth.
Remove caddy_auth from the vhost since basic auth blocks Bearer-only
clients. Internal sidecars (xmrig-pause, annotations) connect
directly to localhost and are unaffected (/slots is public).
Add textfile collector for ZFS pool utilization (tank, hdds) and
boot drive partitions (/boot, /persistent, /nix). Runs every 60s.
Add two Grafana dashboard panels: ZFS Pool Utilization and Boot
Drive Partitions as Row 5.
openrouter was broken: !cat + nix store path is not valid omp config.
Use builtins.readFile to inline the key at eval time.
Add self-hosted llama.cpp provider at llm.sigkill.computer with
Bearer token auth.
Add trilium-server on port 8787 behind Caddy reverse proxy at
notes.sigkill.computer. Data stored on ZFS tank pool with
serviceMountWithZpool for mount ordering.
Poll /slots endpoint, create annotations when slots start processing,
close with token count when complete. Includes NixOS VM test with
mock llama-cpp and grafana servers. Dashboard annotation entry added.
Wrap entire read_one_sample() in try/except to handle all failures
(missing binary, permission errors, malformed JSON, timeouts).
Write zero-valued metrics on failure instead of exiting non-zero.
Increase timeout from 5s to 8s for slower GPU initialization.
Add sidecar service that polls llama-cpp /slots endpoint every 3s.
When any slot is processing, stops xmrig. Restarts xmrig after 10s
grace period when all slots are idle. Handles unreachable llama-cpp
gracefully (leaves xmrig untouched).