llama-cpp: add grafana annotations for inference requests
Poll /slots endpoint, create annotations when slots start processing, close with token count when complete. Includes NixOS VM test with mock llama-cpp and grafana servers. Dashboard annotation entry added.
This commit is contained in:
@@ -48,6 +48,7 @@
|
||||
./services/soulseek.nix
|
||||
|
||||
./services/llama-cpp.nix
|
||||
./services/llama-cpp-annotations.nix
|
||||
|
||||
./services/ups.nix
|
||||
./services/monitoring.nix
|
||||
|
||||
Reference in New Issue
Block a user