49 Commits

Author SHA1 Message Date
12469de580 llama.cpp: things 2026-04-11 10:27:38 -04:00
75319256f3 lib: add mkCaddyReverseProxy, mkFail2banJail, mkGrafanaAnnotationService, extractArrApiKey 2026-04-09 19:54:57 -04:00
4f33b16411 llama.cpp: thing 2026-04-09 14:02:53 -04:00
c0390af1a4 llama-cpp: update
All checks were successful
Build and Deploy / deploy (push) Successful in 2m33s
2026-04-07 22:29:02 -04:00
98310f2582 organize patches + add gemma4 patch
All checks were successful
Build and Deploy / deploy (push) Successful in 2m41s
2026-04-07 20:57:54 -04:00
2884a39eb1 llama-cpp: patch for vulkan support instead
All checks were successful
Build and Deploy / deploy (push) Successful in 7m23s
2026-04-07 20:07:02 -04:00
778b04a80f Reapply "llama-cpp: maybe use vulkan?"
All checks were successful
Build and Deploy / deploy (push) Successful in 2m17s
This reverts commit 9addb1569a.
2026-04-07 19:12:57 -04:00
a12dcb01ec llama-cpp: remove folder 2026-04-06 12:48:28 -04:00
124d33963e organize
All checks were successful
Build and Deploy / deploy (push) Successful in 2m43s
2026-04-03 00:47:12 -04:00
c2ff07b329 llama-cpp: disable 2026-04-03 00:17:38 -04:00
ab9c12cb97 llama-cpp: general changes 2026-04-03 00:17:14 -04:00
0aeb6c5523 llama-cpp: add API key auth via --api-key-file
Some checks failed
Build and Deploy / deploy (push) Failing after 2m49s
Generate and encrypt a Bearer token for llama-cpp's built-in auth.
Remove caddy_auth from the vhost since basic auth blocks Bearer-only
clients. Internal sidecars (xmrig-pause, annotations) connect
directly to localhost and are unaffected (/slots is public).
2026-04-02 18:02:23 -04:00
50453cf0b5 llama-cpp: adjust args
All checks were successful
Build and Deploy / deploy (push) Successful in 2m32s
2026-04-02 16:09:17 -04:00
bb6ea2f1d5 llama-cpp: cpu only
All checks were successful
Build and Deploy / deploy (push) Successful in 20m0s
2026-04-02 15:32:39 -04:00
f342521d46 llama-cpp: re-add w/ turboquant
All checks were successful
Build and Deploy / deploy (push) Successful in 28m52s
2026-04-02 13:42:39 -04:00
65c13babac Revert "re-add llama.cpp (test?)"
This reverts commit 943fa2f531.

Maybe will un-revert once turboquant becomes a thing?
2026-03-30 02:41:39 -04:00
943fa2f531 re-add llama.cpp (test?) 2026-03-30 02:06:50 -04:00
e2529aadc3 fully remove llama-cpp 2026-03-03 14:30:44 -05:00
24691d877e claude'd better security things 2026-03-03 14:30:01 -05:00
5e8a527edf llama.cpp: ngl 8-> 12 2026-03-03 14:29:59 -05:00
1fc1056f9e llama.cpp: reenable + Apriel-1.5-15b-Thinker 2026-03-03 14:29:58 -05:00
e645203118 llama.cpp: testing 2026-03-03 14:29:52 -05:00
16b829ae30 llama-cpp: fix postPatch phase 2026-03-03 14:29:50 -05:00
05933c9b84 llama-cpp: change model 2026-03-03 14:29:49 -05:00
84b0913fa6 llama-cpp: use gpt-oss-20b-mxfp4 2026-03-03 14:28:58 -05:00
07b4fc2d90 extend nixpkgs's lib instead 2026-03-03 14:28:46 -05:00
432d53318a DeepSeek-R1-0528-Qwen3-8B 2026-03-03 14:28:25 -05:00
22f6682cee llama-cpp: use q8 quantization instead of q4 2026-03-03 14:28:22 -05:00
5835da1f7b llama-cpp: disable gpu 2026-03-03 14:28:21 -05:00
c8c150e10c llama-cpp: vulkan broken 2026-03-03 14:28:21 -05:00
efb0bd38e8 llama-cpp: disable flash attn 2026-03-03 14:28:20 -05:00
0f46de5eb7 llama-cpp: nvidia-acereason-7b 2026-03-03 14:28:19 -05:00
cf3e032acb llm: use vulkan 2026-03-03 14:28:00 -05:00
51704a0543 llm: use xiomo model 2026-03-03 14:27:58 -05:00
a5f4f65894 deepcoder 14b 2026-03-03 14:27:47 -05:00
06f47a32af change llm model 2026-03-03 14:27:45 -05:00
d52154770e llm: model stuff 2026-03-03 14:27:40 -05:00
5161e62433 create single function to optimize for system 2026-03-03 14:27:37 -05:00
99978c108b move optimizeWithFlags 2026-03-03 14:27:37 -05:00
3c727db2b2 fmt 2026-03-03 14:27:32 -05:00
d8d90a2cfd llm: use finetuned model 2026-03-03 14:27:30 -05:00
3119cc3594 gemma-3 27b 2026-03-03 14:27:30 -05:00
516e2391a7 llm: use Q4_0 quants (faster) 2026-03-03 14:27:29 -05:00
8ac8f70700 format 2026-03-03 14:27:29 -05:00
4a3b1b14f2 llm: enable AVX2 2026-03-03 14:27:28 -05:00
75ea442642 llama-cpp: compiler optimizations 2026-03-03 14:27:27 -05:00
6cd839cdce gemma-3 12b 2026-03-03 14:27:27 -05:00
6097b3ce0f auth for llm 2026-03-03 14:27:26 -05:00
925031c640 add llama-server 2026-03-03 14:27:26 -05:00