Agent Zero utility model crashes on history compression with Ollama — litellm 405 "method not allowed" on /api/generate

Pilot Hobs

Mar 16 • ❓ Q&A

Setup:

Agent Zero latest image (agent0ai/agent-zero:latest),

litellm 1.79.3 (bundled in container),

Chat model: Anthropic via OpenRouter (works fine),

Utility model: switched to local Ollama (qwen3:32b on RTX 5090),

Ollama runs on host, reachable from container via http://host.docker.internal:11434,

The problem:

Every time A0's conversation history gets long enough to trigger compression (_90_organize_history_wait.py → compress_attention → call_utility_model), it crashes with:

litellm.APIConnectionError: OllamaException - 405 method not allowed

URL: http://host.docker.internal:11434/api/generate

The crash is fatal — kills the entire agent session.

What I've verified:

Ollama is healthy — curl -X POST http://host.docker.internal:11434/api/generate with a valid payload works perfectly from inside the A0 container,

qwen3:32b is loaded and running on GPU (27GB VRAM, not CPU fallback),

settings.json is correct:,

"util_model_provider": "ollama",

"util_model_name": "qwen3:32b",

"util_model_api_base": "http://host.docker.internal:11434"

Short tasks complete fine — the crash only happens when history is long enough to trigger the summarization/compression path,

:thumbsup:

Tried ollama as the provider(per the setup guide) -- same 405,

Tried ollama_chat as provider — same 405,

What I think is happening:

litellm 1.79.3's Ollama provider is sending a malformed request to /api/generate — either wrong HTTP method, wrong content-type, or wrong payload format. Ollama's /api/generate endpoint returns 405 which means it's receiving a request it doesn't accept (likely a GET instead of POST, or missing required fields).

The main chat model (Anthropic/OpenRouter) works because it never touches this code path. Only the utility model goes through the Ollama provider, and only during compression.

Questions:

Has anyone successfully run the utility model on local Ollama? What provider/model/base settings did you use?,

Is there a known litellm version compatibility issue with newer Ollama versions?,

Is there a way to disable history compression entirely as a workaround?,

Would setting the utility model to use the OpenAI-compatible endpoint (http://host:11434/v1) with provider openai work instead of the native Ollama provider?,

Environment:

Docker, agent-zero container with host networking via extra_hosts: host.docker.internal:host-gateway,

Ollama v0.x (latest) with GPU passthrough,

Host: Ubuntu, RTX 5090, 186GB RAM,

Any help appreciated — the agent works great until compression kills it.

TIA

Hobson

1 comment

Agent Zero utility model crashes on history compression with Ollama — litellm 405 "method not allowed" on /api/generate