server-config

Archived

Author	SHA1	Message	Date
Simon Gardling	12469de580	llama.cpp: things	2026-04-11 10:27:38 -04:00
Simon Gardling	75319256f3	lib: add mkCaddyReverseProxy, mkFail2banJail, mkGrafanaAnnotationService, extractArrApiKey	2026-04-09 19:54:57 -04:00
Simon Gardling	4f33b16411	llama.cpp: thing	2026-04-09 14:02:53 -04:00
Simon Gardling	c0390af1a4	llama-cpp: update All checks were successful Build and Deploy / deploy (push) Successful in 2m33s	2026-04-07 22:29:02 -04:00
Simon Gardling	98310f2582	organize patches + add gemma4 patch All checks were successful Build and Deploy / deploy (push) Successful in 2m41s	2026-04-07 20:57:54 -04:00
Simon Gardling	2884a39eb1	llama-cpp: patch for vulkan support instead All checks were successful Build and Deploy / deploy (push) Successful in 7m23s	2026-04-07 20:07:02 -04:00
Simon Gardling	778b04a80f	Reapply "llama-cpp: maybe use vulkan?" All checks were successful Build and Deploy / deploy (push) Successful in 2m17s This reverts commit `9addb1569a`.	2026-04-07 19:12:57 -04:00
Simon Gardling	a12dcb01ec	llama-cpp: remove folder	2026-04-06 12:48:28 -04:00
Simon Gardling	124d33963e	organize All checks were successful Build and Deploy / deploy (push) Successful in 2m43s	2026-04-03 00:47:12 -04:00
Simon Gardling	c2ff07b329	llama-cpp: disable	2026-04-03 00:17:38 -04:00
Simon Gardling	ab9c12cb97	llama-cpp: general changes	2026-04-03 00:17:14 -04:00
Simon Gardling	0aeb6c5523	llama-cpp: add API key auth via --api-key-file Some checks failed Build and Deploy / deploy (push) Failing after 2m49s Generate and encrypt a Bearer token for llama-cpp's built-in auth. Remove caddy_auth from the vhost since basic auth blocks Bearer-only clients. Internal sidecars (xmrig-pause, annotations) connect directly to localhost and are unaffected (/slots is public).	2026-04-02 18:02:23 -04:00
Simon Gardling	50453cf0b5	llama-cpp: adjust args All checks were successful Build and Deploy / deploy (push) Successful in 2m32s	2026-04-02 16:09:17 -04:00
Simon Gardling	bb6ea2f1d5	llama-cpp: cpu only All checks were successful Build and Deploy / deploy (push) Successful in 20m0s	2026-04-02 15:32:39 -04:00
Simon Gardling	f342521d46	llama-cpp: re-add w/ turboquant All checks were successful Build and Deploy / deploy (push) Successful in 28m52s	2026-04-02 13:42:39 -04:00
Simon Gardling	65c13babac	Revert "re-add llama.cpp (test?)" This reverts commit `943fa2f531`. Maybe will un-revert once turboquant becomes a thing?	2026-03-30 02:41:39 -04:00
Simon Gardling	943fa2f531	re-add llama.cpp (test?)	2026-03-30 02:06:50 -04:00
Simon Gardling	e2529aadc3	fully remove llama-cpp	2026-03-03 14:30:44 -05:00
Simon Gardling	24691d877e	claude'd better security things	2026-03-03 14:30:01 -05:00
Simon Gardling	5e8a527edf	llama.cpp: ngl 8-> 12	2026-03-03 14:29:59 -05:00
Simon Gardling	1fc1056f9e	llama.cpp: reenable + Apriel-1.5-15b-Thinker	2026-03-03 14:29:58 -05:00
Simon Gardling	e645203118	llama.cpp: testing	2026-03-03 14:29:52 -05:00
Simon Gardling	16b829ae30	llama-cpp: fix postPatch phase	2026-03-03 14:29:50 -05:00
Simon Gardling	05933c9b84	llama-cpp: change model	2026-03-03 14:29:49 -05:00
Simon Gardling	84b0913fa6	llama-cpp: use gpt-oss-20b-mxfp4	2026-03-03 14:28:58 -05:00
Simon Gardling	07b4fc2d90	extend nixpkgs's lib instead	2026-03-03 14:28:46 -05:00
Simon Gardling	432d53318a	DeepSeek-R1-0528-Qwen3-8B	2026-03-03 14:28:25 -05:00
Simon Gardling	22f6682cee	llama-cpp: use q8 quantization instead of q4	2026-03-03 14:28:22 -05:00
Simon Gardling	5835da1f7b	llama-cpp: disable gpu	2026-03-03 14:28:21 -05:00
Simon Gardling	c8c150e10c	llama-cpp: vulkan broken	2026-03-03 14:28:21 -05:00
Simon Gardling	efb0bd38e8	llama-cpp: disable flash attn	2026-03-03 14:28:20 -05:00
Simon Gardling	0f46de5eb7	llama-cpp: nvidia-acereason-7b	2026-03-03 14:28:19 -05:00
Simon Gardling	cf3e032acb	llm: use vulkan	2026-03-03 14:28:00 -05:00
Simon Gardling	51704a0543	llm: use xiomo model	2026-03-03 14:27:58 -05:00
Simon Gardling	a5f4f65894	deepcoder 14b	2026-03-03 14:27:47 -05:00
Simon Gardling	06f47a32af	change llm model	2026-03-03 14:27:45 -05:00
Simon Gardling	d52154770e	llm: model stuff	2026-03-03 14:27:40 -05:00
Simon Gardling	5161e62433	create single function to optimize for system	2026-03-03 14:27:37 -05:00
Simon Gardling	99978c108b	move optimizeWithFlags	2026-03-03 14:27:37 -05:00
Simon Gardling	3c727db2b2	fmt	2026-03-03 14:27:32 -05:00
Simon Gardling	d8d90a2cfd	llm: use finetuned model	2026-03-03 14:27:30 -05:00
Simon Gardling	3119cc3594	gemma-3 27b	2026-03-03 14:27:30 -05:00
Simon Gardling	516e2391a7	llm: use Q4_0 quants (faster)	2026-03-03 14:27:29 -05:00
Simon Gardling	8ac8f70700	format	2026-03-03 14:27:29 -05:00
Simon Gardling	4a3b1b14f2	llm: enable AVX2	2026-03-03 14:27:28 -05:00
Simon Gardling	75ea442642	llama-cpp: compiler optimizations	2026-03-03 14:27:27 -05:00
Simon Gardling	6cd839cdce	gemma-3 12b	2026-03-03 14:27:27 -05:00
Simon Gardling	6097b3ce0f	auth for llm	2026-03-03 14:27:26 -05:00
Simon Gardling	925031c640	add llama-server	2026-03-03 14:27:26 -05:00

49 Commits