Agent Zero utility model crashes on history compression with Ollama — litellm 405 "method not allowed" on /api/generate
Setup:
Agent Zero latest image (agent0ai/agent-zero:latest),
litellm 1.79.3 (bundled in container),
Chat model: Anthropic via OpenRouter (works fine),
Utility model: switched to local Ollama (qwen3:32b on RTX 5090),
Ollama runs on host, reachable from container via http://host.docker.internal:11434,
The problem:
Every time A0's conversation history gets long enough to trigger compression (_90_organize_history_wait.py → compress_attention → call_utility_model), it crashes with:
litellm.APIConnectionError: OllamaException - 405 method not allowed
The crash is fatal — kills the entire agent session.
What I've verified:
Ollama is healthy — curl -X POST http://host.docker.internal:11434/api/generate with a valid payload works perfectly from inside the A0 container,
qwen3:32b is loaded and running on GPU (27GB VRAM, not CPU fallback),
settings.json is correct:,
"util_model_provider": "ollama",
"util_model_name": "qwen3:32b",
"util_model_api_base": "http://host.docker.internal:11434"
Short tasks complete fine — the crash only happens when history is long enough to trigger the summarization/compression path,
:thumbsup:
Tried ollama as the provider(per the setup guide) -- same 405,
Tried ollama_chat as provider — same 405,
What I think is happening:
litellm 1.79.3's Ollama provider is sending a malformed request to /api/generate — either wrong HTTP method, wrong content-type, or wrong payload format. Ollama's /api/generate endpoint returns 405 which means it's receiving a request it doesn't accept (likely a GET instead of POST, or missing required fields).
The main chat model (Anthropic/OpenRouter) works because it never touches this code path. Only the utility model goes through the Ollama provider, and only during compression.
Questions:
Has anyone successfully run the utility model on local Ollama? What provider/model/base settings did you use?,
Is there a known litellm version compatibility issue with newer Ollama versions?,
Is there a way to disable history compression entirely as a workaround?,
Would setting the utility model to use the OpenAI-compatible endpoint (http://host:11434/v1) with provider openai work instead of the native Ollama provider?,
Environment:
Docker, agent-zero container with host networking via extra_hosts: host.docker.internal:host-gateway,
Ollama v0.x (latest) with GPU passthrough,
Host: Ubuntu, RTX 5090, 186GB RAM,
Any help appreciated — the agent works great until compression kills it.
TIA
Hobson
0
1 comment
Pilot Hobs
1
Agent Zero utility model crashes on history compression with Ollama — litellm 405 "method not allowed" on /api/generate
Agent Zero
skool.com/agent-zero
Agent Zero AI framework
Leaderboard (30-day)
Powered by