Timeout and thinking field issue when using gpt-oss:20b (Ollama) with Agent-Zero v0.9.5 on Windows 11
Hi everyone,
I’m running Agent-Zero v0.9.5 together with Ollama locally on Windows 11, and I’m having trouble when using the gpt-oss:20b model.
The main issue is a timeout after 600 seconds without response:
litellm.exceptions.APIConnectionError: OllamaException - litellm.Timeout: Connection timed out. Timeout passed=600.0, time taken=600.011 seconds
In other cases, when streaming is enabled, LiteLLM fails while parsing the response chunks because the model outputs an additional "thinking" field:
Exception: Unable to parse ollama chunk - {'model': 'gpt-oss:20b', 'response': '', 'thinking': 'We', 'done': False}
With other models like gemma3:12b-it-qat or qwen2.5-coder, everything works fine. The problem seems limited to gpt-oss:20b and sometimes the long cold-start latency.
Things I’ve already tried:
Pointing api_base to http://127.0.0.1:11434 (works fine for other models).
Forcing stream: false only for gpt-oss:20b.
Warming up the model with keep_alive: "1h" and reduced num_predict.
Increasing timeout above 600s.
My question is:
👉 Is there any recommended configuration or official integration in Agent-Zero for handling Ollama/LM Studio models that emit extra fields like thinking, or that require very long time to produce the first token?
I’d like to use gpt-oss:20b locally on Windows 11 without breaking the general functionality of the agent, and still keep streaming enabled for other models.
Thanks in advance for any guidance or a simpler solution!
0
0 comments
Kelvin Leonard Mendoza Ceballo
1
Timeout and thinking field issue when using gpt-oss:20b (Ollama) with Agent-Zero v0.9.5 on Windows 11
Agent Zero
skool.com/agent-zero
Agent Zero AI framework
Leaderboard (30-day)
Powered by