Timeout and thinking field issue when using gpt-oss:20b (Ollama) with Agent-Zero v0.9.5 on Windows 11

Sep '25 • ❓ Q&A

Hi everyone,

I’m running Agent-Zero v0.9.5 together with Ollama locally on Windows 11, and I’m having trouble when using the gpt-oss:20b model.

The main issue is a timeout after 600 seconds without response:

litellm.exceptions.APIConnectionError: OllamaException - litellm.Timeout: Connection timed out. Timeout passed=600.0, time taken=600.011 seconds

In other cases, when streaming is enabled, LiteLLM fails while parsing the response chunks because the model outputs an additional "thinking" field:

Exception: Unable to parse ollama chunk - {'model': 'gpt-oss:20b', 'response': '', 'thinking': 'We', 'done': False}

With other models like gemma3:12b-it-qat or qwen2.5-coder, everything works fine. The problem seems limited to gpt-oss:20b and sometimes the long cold-start latency.

Things I’ve already tried:

Pointing api_base to http://127.0.0.1:11434 (works fine for other models).

Forcing stream: false only for gpt-oss:20b.

Warming up the model with keep_alive: "1h" and reduced num_predict.

Increasing timeout above 600s.

My question is:

👉 Is there any recommended configuration or official integration in Agent-Zero for handling Ollama/LM Studio models that emit extra fields like thinking, or that require very long time to produce the first token?

I’d like to use gpt-oss:20b locally on Windows 11 without breaking the general functionality of the agent, and still keep streaming enabled for other models.

Thanks in advance for any guidance or a simpler solution!

0 comments