Gemma LLM - Removed
Why It Was Removed
Section titled “Why It Was Removed”Gemma 4 E2B ran CPU-only inference on the free-tier ARM64 nodes (llama.cpp, Q4_K_M GGUF). The 12K+ token system prompts OpenClaw generates took 15+ minutes to process on ARM CPUs, causing persistent agent timeouts.
After exhausting optimization options (batch size, context window, prompt caching, quantization levels), the decision was made to replace local inference with cloud APIs. Cloud models deliver sub-second responses with no node resource pressure.
Migration
Section titled “Migration”| Before | After |
|---|---|
ghcr.io/nsudhanva/llama-server sidecar | No local LLM |
gemma.k8s.sudhanva.me endpoint | Removed |
gemma-api-key secret | Removed |
argocd/apps/gemma/ | Deleted |
| 5Gi model cache PVC | Freed |
OpenClaw’s model configuration in argocd/apps/openclaw/openclawinstance.yaml handles all AI provider routing. See the OpenClaw docs for details.