9addb1569a
Revert "llama-cpp: maybe use vulkan?"
...
This reverts commit 0a927ea893 .
2026-04-06 02:28:26 -04:00
df04e36b41
llama-cpp: fix vulkan cache
Build and Deploy / deploy (push) Failing after 1m23s
2026-04-06 02:23:29 -04:00
0a927ea893
llama-cpp: maybe use vulkan?
Build and Deploy / deploy (push) Successful in 8m30s
2026-04-06 02:12:46 -04:00
3e46c5bfa5
llama-cpp: use turbo3 for everything
Build and Deploy / deploy (push) Successful in 1m20s
2026-04-06 01:53:11 -04:00
06aee5af77
llama-cpp: gemma 4 E4B -> gemma 4 E2B
Build and Deploy / deploy (push) Successful in 2m5s
2026-04-06 01:24:25 -04:00
8fddd3a954
llama-cpp: context: 32768 -> 65536
Build and Deploy / deploy (push) Successful in 2m58s
2026-04-06 01:04:23 -04:00
0e4f0d3176
llama-cpp: fix model name
Build and Deploy / deploy (push) Successful in 1m18s
2026-04-06 00:59:20 -04:00
8ea96c8b8e
llama-cpp: fix model hash
Build and Deploy / deploy (push) Successful in 2m36s
2026-04-04 00:28:07 -04:00
479ec43b8f
llama-cpp: integrate native prometheus /metrics endpoint
...
llama.cpp server has a built-in /metrics endpoint exposing
prompt_tokens_seconds, predicted_tokens_seconds, tokens_predicted_total,
n_decode_total, and n_busy_slots_per_decode. Enable it with --metrics
and add a Prometheus scrape target, replacing the need for any external
metric collection for LLM inference monitoring.
2026-04-03 15:19:11 -04:00
47aeb58f7a
llama-cpp: do logging
Build and Deploy / deploy (push) Successful in 2m27s
2026-04-03 14:39:46 -04:00
daf82c16ba
fix xmrig pause
2026-04-03 14:39:20 -04:00
d4d01d63f1
llama-cpp: update + re-enable + gemma 4 E4B
Build and Deploy / deploy (push) Failing after 20m16s
2026-04-03 14:06:35 -04:00
124d33963e
organize
Build and Deploy / deploy (push) Successful in 2m43s
2026-04-03 00:47:12 -04:00