llama-cpp: add grafana annotations for inference requests

Poll /slots endpoint, create annotations when slots start processing,
close with token count when complete. Includes NixOS VM test with
mock llama-cpp and grafana servers. Dashboard annotation entry added.
This commit is contained in:
2026-04-02 17:43:49 -04:00
parent 0235617627
commit 9baeaa5c23
6 changed files with 362 additions and 0 deletions

View File

@@ -28,6 +28,9 @@ in
# zfs scrub annotations test
zfsScrubAnnotationsTest = handleTest ./zfs-scrub-annotations.nix;
# llama-cpp annotation service test
llamaCppAnnotationsTest = handleTest ./llama-cpp-annotations.nix;
# ntfy alerts test
ntfyAlertsTest = handleTest ./ntfy-alerts.nix;