Agent Zero utility model crashes on history compression with Ollama — litellm 405 "method not allowed" on /api/generate
Setup: Agent Zero latest image (agent0ai/agent-zero:latest), litellm 1.79.3 (bundled in container), Chat model: Anthropic via OpenRouter (works fine), Utility model: switched to local Ollama (qwen3:32b on RTX 5090), Ollama runs on host, reachable from container via http://host.docker.internal:11434, The problem: Every time A0's conversation history gets long enough to trigger compression (_90_organize_history_wait.py → compress_attention → call_utility_model), it crashes with: litellm.APIConnectionError: OllamaException - 405 method not allowed URL: http://host.docker.internal:11434/api/generate The crash is fatal — kills the entire agent session. What I've verified: Ollama is healthy — curl -X POST http://host.docker.internal:11434/api/generate with a valid payload works perfectly from inside the A0 container, qwen3:32b is loaded and running on GPU (27GB VRAM, not CPU fallback), settings.json is correct:, "util_model_provider": "ollama", "util_model_name": "qwen3:32b", "util_model_api_base": "http://host.docker.internal:11434" Short tasks complete fine — the crash only happens when history is long enough to trigger the summarization/compression path, :thumbsup: Tried ollama as the provider(per the setup guide) -- same 405, Tried ollama_chat as provider — same 405, What I think is happening: litellm 1.79.3's Ollama provider is sending a malformed request to /api/generate — either wrong HTTP method, wrong content-type, or wrong payload format. Ollama's /api/generate endpoint returns 405 which means it's receiving a request it doesn't accept (likely a GET instead of POST, or missing required fields). The main chat model (Anthropic/OpenRouter) works because it never touches this code path. Only the utility model goes through the Ollama provider, and only during compression.