12469de580
llama.cpp: things
2026-04-11 10:27:38 -04:00
75319256f3
lib: add mkCaddyReverseProxy, mkFail2banJail, mkGrafanaAnnotationService, extractArrApiKey
2026-04-09 19:54:57 -04:00
4f33b16411
llama.cpp: thing
2026-04-09 14:02:53 -04:00
c0390af1a4
llama-cpp: update
Build and Deploy / deploy (push) Successful in 2m33s
2026-04-07 22:29:02 -04:00
98310f2582
organize patches + add gemma4 patch
Build and Deploy / deploy (push) Successful in 2m41s
2026-04-07 20:57:54 -04:00
2884a39eb1
llama-cpp: patch for vulkan support instead
Build and Deploy / deploy (push) Successful in 7m23s
2026-04-07 20:07:02 -04:00
778b04a80f
Reapply "llama-cpp: maybe use vulkan?"
...
Build and Deploy / deploy (push) Successful in 2m17s
This reverts commit 9addb1569a .
2026-04-07 19:12:57 -04:00
a12dcb01ec
llama-cpp: remove folder
2026-04-06 12:48:28 -04:00
124d33963e
organize
Build and Deploy / deploy (push) Successful in 2m43s
2026-04-03 00:47:12 -04:00
c2ff07b329
llama-cpp: disable
2026-04-03 00:17:38 -04:00
ab9c12cb97
llama-cpp: general changes
2026-04-03 00:17:14 -04:00
0aeb6c5523
llama-cpp: add API key auth via --api-key-file
...
Build and Deploy / deploy (push) Failing after 2m49s
Generate and encrypt a Bearer token for llama-cpp's built-in auth.
Remove caddy_auth from the vhost since basic auth blocks Bearer-only
clients. Internal sidecars (xmrig-pause, annotations) connect
directly to localhost and are unaffected (/slots is public).
2026-04-02 18:02:23 -04:00
50453cf0b5
llama-cpp: adjust args
Build and Deploy / deploy (push) Successful in 2m32s
2026-04-02 16:09:17 -04:00
bb6ea2f1d5
llama-cpp: cpu only
Build and Deploy / deploy (push) Successful in 20m0s
2026-04-02 15:32:39 -04:00
f342521d46
llama-cpp: re-add w/ turboquant
Build and Deploy / deploy (push) Successful in 28m52s
2026-04-02 13:42:39 -04:00
65c13babac
Revert "re-add llama.cpp (test?)"
...
This reverts commit 943fa2f531 .
Maybe will un-revert once turboquant becomes a thing?
2026-03-30 02:41:39 -04:00
943fa2f531
re-add llama.cpp (test?)
2026-03-30 02:06:50 -04:00
e2529aadc3
fully remove llama-cpp
2026-03-03 14:30:44 -05:00
24691d877e
claude'd better security things
2026-03-03 14:30:01 -05:00
5e8a527edf
llama.cpp: ngl 8-> 12
2026-03-03 14:29:59 -05:00
1fc1056f9e
llama.cpp: reenable + Apriel-1.5-15b-Thinker
2026-03-03 14:29:58 -05:00
e645203118
llama.cpp: testing
2026-03-03 14:29:52 -05:00
16b829ae30
llama-cpp: fix postPatch phase
2026-03-03 14:29:50 -05:00
05933c9b84
llama-cpp: change model
2026-03-03 14:29:49 -05:00
84b0913fa6
llama-cpp: use gpt-oss-20b-mxfp4
2026-03-03 14:28:58 -05:00
07b4fc2d90
extend nixpkgs's lib instead
2026-03-03 14:28:46 -05:00
432d53318a
DeepSeek-R1-0528-Qwen3-8B
2026-03-03 14:28:25 -05:00
22f6682cee
llama-cpp: use q8 quantization instead of q4
2026-03-03 14:28:22 -05:00
5835da1f7b
llama-cpp: disable gpu
2026-03-03 14:28:21 -05:00
c8c150e10c
llama-cpp: vulkan broken
2026-03-03 14:28:21 -05:00
efb0bd38e8
llama-cpp: disable flash attn
2026-03-03 14:28:20 -05:00
0f46de5eb7
llama-cpp: nvidia-acereason-7b
2026-03-03 14:28:19 -05:00
cf3e032acb
llm: use vulkan
2026-03-03 14:28:00 -05:00
51704a0543
llm: use xiomo model
2026-03-03 14:27:58 -05:00
a5f4f65894
deepcoder 14b
2026-03-03 14:27:47 -05:00
06f47a32af
change llm model
2026-03-03 14:27:45 -05:00
d52154770e
llm: model stuff
2026-03-03 14:27:40 -05:00
5161e62433
create single function to optimize for system
2026-03-03 14:27:37 -05:00
99978c108b
move optimizeWithFlags
2026-03-03 14:27:37 -05:00
3c727db2b2
fmt
2026-03-03 14:27:32 -05:00
d8d90a2cfd
llm: use finetuned model
2026-03-03 14:27:30 -05:00
3119cc3594
gemma-3 27b
2026-03-03 14:27:30 -05:00
516e2391a7
llm: use Q4_0 quants (faster)
2026-03-03 14:27:29 -05:00
8ac8f70700
format
2026-03-03 14:27:29 -05:00
4a3b1b14f2
llm: enable AVX2
2026-03-03 14:27:28 -05:00
75ea442642
llama-cpp: compiler optimizations
2026-03-03 14:27:27 -05:00
6cd839cdce
gemma-3 12b
2026-03-03 14:27:27 -05:00
6097b3ce0f
auth for llm
2026-03-03 14:27:26 -05:00
925031c640
add llama-server
2026-03-03 14:27:26 -05:00