This repository has been archived on 2026-04-18. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
server-config/services/llama-cpp
Simon Gardling 479ec43b8f llama-cpp: integrate native prometheus /metrics endpoint
llama.cpp server has a built-in /metrics endpoint exposing
prompt_tokens_seconds, predicted_tokens_seconds, tokens_predicted_total,
n_decode_total, and n_busy_slots_per_decode. Enable it with --metrics
and add a Prometheus scrape target, replacing the need for any external
metric collection for LLM inference monitoring.
2026-04-03 15:19:11 -04:00
..
2026-04-03 14:39:20 -04:00