Timeout and thinking field issue when using gpt-oss:20b (Ollama) with Agent-Zero v0.9.5 on Windows 11
Hi everyone, I’m running Agent-Zero v0.9.5 together with Ollama locally on Windows 11, and I’m having trouble when using the gpt-oss:20b model. The main issue is a timeout after 600 seconds without response: litellm.exceptions.APIConnectionError: OllamaException - litellm.Timeout: Connection timed out. Timeout passed=600.0, time taken=600.011 seconds In other cases, when streaming is enabled, LiteLLM fails while parsing the response chunks because the model outputs an additional "thinking" field: Exception: Unable to parse ollama chunk - {'model': 'gpt-oss:20b', 'response': '', 'thinking': 'We', 'done': False} With other models like gemma3:12b-it-qat or qwen2.5-coder, everything works fine. The problem seems limited to gpt-oss:20b and sometimes the long cold-start latency. Things I’ve already tried: Pointing api_base to http://127.0.0.1:11434 (works fine for other models). Forcing stream: false only for gpt-oss:20b. Warming up the model with keep_alive: "1h" and reduced num_predict. Increasing timeout above 600s. My question is: 👉 Is there any recommended configuration or official integration in Agent-Zero for handling Ollama/LM Studio models that emit extra fields like thinking, or that require very long time to produce the first token? I’d like to use gpt-oss:20b locally on Windows 11 without breaking the general functionality of the agent, and still keep streaming enabled for other models. Thanks in advance for any guidance or a simpler solution!