Build and Deploy / deploy (push) Failing after 1m16s
Arc A380 GPU (07:00.0) becomes unreachable (MMIO returns 0xFFFFFFFF)
when PCIe ASPM powersupersave puts it into L1.1/L1.2 substates.
Both i915 and xe drivers hit the same hardware failure.
Fix: disable runtime PM for the GPU in power-tune, run after powertop
so the override sticks. Use i915 driver (xe has iHD buffer mapping
failures on this GPU/kernel 6.12 combination).
Add trilium-server on port 8787 behind Caddy reverse proxy at
notes.sigkill.computer. Data stored on ZFS tank pool with
serviceMountWithZpool for mount ordering.
Poll /slots endpoint, create annotations when slots start processing,
close with token count when complete. Includes NixOS VM test with
mock llama-cpp and grafana servers. Dashboard annotation entry added.
Add sidecar service that polls llama-cpp /slots endpoint every 3s.
When any slot is processing, stops xmrig. Restarts xmrig after 10s
grace period when all slots are idle. Handles unreachable llama-cpp
gracefully (leaves xmrig untouched).
Build and Deploy / deploy (push) Successful in 2m38s
1. Smoothed out power draw
- UPS only reports on 9 watt intervals, so smoothing it out gives more
relative detail on trends
2. Add jellyfin integration
- Good for seeing correlations between statistics and jellyfin streams
3. intel gpu stats
- Provides info on utilization of the gpu
- coturn: switch static-auth-secret to static-auth-secret-file
- matrix: switch registration_token and turn_secret to file-based
- murmur: switch password to environmentFile with agenix
- p2pool: move public wallet address to service-configs.nix
This gave me a lot of panic and grief. JetKVM got NO monitor output
I was panicing and away from home. Was awful. After letting it sit off
for a few hours it fixed itself, inline with nvram state draining over
time.