llama.cpp server has a built-in /metrics endpoint exposing
prompt_tokens_seconds, predicted_tokens_seconds, tokens_predicted_total,
n_decode_total, and n_busy_slots_per_decode. Enable it with --metrics
and add a Prometheus scrape target, replacing the need for any external
metric collection for LLM inference monitoring.
overrideDerivation has been deprecated since 2019. The new
overrideAttrs properly handles the env attribute set used by
modern derivations to avoid the NIX_CFLAGS_COMPILE overlap
error between env and top-level derivation arguments.
Generate and encrypt a Bearer token for llama-cpp's built-in auth.
Remove caddy_auth from the vhost since basic auth blocks Bearer-only
clients. Internal sidecars (xmrig-pause, annotations) connect
directly to localhost and are unaffected (/slots is public).
Add textfile collector for ZFS pool utilization (tank, hdds) and
boot drive partitions (/boot, /persistent, /nix). Runs every 60s.
Add two Grafana dashboard panels: ZFS Pool Utilization and Boot
Drive Partitions as Row 5.
openrouter was broken: !cat + nix store path is not valid omp config.
Use builtins.readFile to inline the key at eval time.
Add self-hosted llama.cpp provider at llm.sigkill.computer with
Bearer token auth.
Add trilium-server on port 8787 behind Caddy reverse proxy at
notes.sigkill.computer. Data stored on ZFS tank pool with
serviceMountWithZpool for mount ordering.
Poll /slots endpoint, create annotations when slots start processing,
close with token count when complete. Includes NixOS VM test with
mock llama-cpp and grafana servers. Dashboard annotation entry added.
Wrap entire read_one_sample() in try/except to handle all failures
(missing binary, permission errors, malformed JSON, timeouts).
Write zero-valued metrics on failure instead of exiting non-zero.
Increase timeout from 5s to 8s for slower GPU initialization.
Add sidecar service that polls llama-cpp /slots endpoint every 3s.
When any slot is processing, stops xmrig. Restarts xmrig after 10s
grace period when all slots are idle. Handles unreachable llama-cpp
gracefully (leaves xmrig untouched).
1. Smoothed out power draw
- UPS only reports on 9 watt intervals, so smoothing it out gives more
relative detail on trends
2. Add jellyfin integration
- Good for seeing correlations between statistics and jellyfin streams
3. intel gpu stats
- Provides info on utilization of the gpu
- coturn: switch static-auth-secret to static-auth-secret-file
- matrix: switch registration_token and turn_secret to file-based
- murmur: switch password to environmentFile with agenix
- p2pool: move public wallet address to service-configs.nix