479ec43b8fe7a2d2f73d14a241ffc098d67d9528
llama.cpp server has a built-in /metrics endpoint exposing prompt_tokens_seconds, predicted_tokens_seconds, tokens_predicted_total, n_decode_total, and n_busy_slots_per_decode. Enable it with --metrics and add a Prometheus scrape target, replacing the need for any external metric collection for LLM inference monitoring.
Description
Unified NixOS flake for mreow, yarn, muffin
Languages
Nix
84.6%
Python
10.7%
Emacs Lisp
2.6%
Shell
2.1%