This repository has been archived on 2026-04-18 . You can view files and clone it. You cannot open issues or pull requests or push a commit.
479ec43b8fe7a2d2f73d14a241ffc098d67d9528
llama.cpp server has a built-in /metrics endpoint exposing prompt_tokens_seconds, predicted_tokens_seconds, tokens_predicted_total, n_decode_total, and n_busy_slots_per_decode. Enable it with --metrics and add a Prometheus scrape target, replacing the need for any external metric collection for LLM inference monitoring.
Description
No description provided
Languages
Nix
83.1%
Python
15.4%
Shell
1.5%