nixos/services/llama-cpp/llama-cpp.nix at 479ec43b8fe7a2d2f73d14a241ffc098d67d9528

Files

Simon Gardling 479ec43b8f llama-cpp: integrate native prometheus /metrics endpoint

llama.cpp server has a built-in /metrics endpoint exposing
prompt_tokens_seconds, predicted_tokens_seconds, tokens_predicted_total,
n_decode_total, and n_busy_slots_per_decode. Enable it with --metrics
and add a Prometheus scrape target, replacing the need for any external
metric collection for LLM inference monitoring.

2026-04-03 15:19:11 -04:00

1.7 KiB

Raw Blame History

View Raw

1.7 KiB Raw Blame History

1.7 KiB

Raw Blame History