Gemma LLM - Removed

Why It Was Removed

Gemma 4 E2B ran CPU-only inference on the free-tier ARM64 nodes (llama.cpp, Q4_K_M GGUF). The 12K+ token system prompts OpenClaw generates took 15+ minutes to process on ARM CPUs, causing persistent agent timeouts.

After exhausting optimization options (batch size, context window, prompt caching, quantization levels), the decision was made to replace local inference with cloud APIs. Cloud models deliver sub-second responses with no node resource pressure.

Migration

Before	After
`ghcr.io/nsudhanva/llama-server` sidecar	No local LLM
`gemma.k8s.sudhanva.me` endpoint	Removed
`gemma-api-key` secret	Removed
`argocd/apps/gemma/`	Deleted
5Gi model cache PVC	Freed

OpenClaw’s model configuration in argocd/apps/openclaw/openclawinstance.yaml handles all AI provider routing. See the OpenClaw docs for details.