Kelvin Leonard Mendoza Ceballo

Hi everyone, I’m running Agent-Zero v0.9.5 together with Ollama locally on Windows 11, and I’m having trouble when using the gpt-oss:20b model. The main issue is a timeout after 600 seconds without response: litellm.exceptions.APIConnectionError: OllamaException - litellm.Timeout: Connection timed out. Timeout passed=600.0, time taken=600.011 seconds In other cases, when streaming is enabled, LiteLLM fails while parsing the response chunks because the model outputs an additional "thinking" field: Exception: Unable to parse ollama chunk - {'model': 'gpt-oss:20b', 'response': '', 'thinking': 'We', 'done': False} With other models like gemma3:12b-it-qat or qwen2.5-coder, everything works fine. The problem seems limited to gpt-oss:20b and sometimes the long cold-start latency. Things I’ve already tried: Pointing api_base to http://127.0.0.1:11434 (works fine for other models). Forcing stream: false only for gpt-oss:20b. Warming up the model with keep_alive: "1h" and reduced num_predict. Increasing timeout above 600s. My question is: 👉 Is there any recommended configuration or official integration in Agent-Zero for handling Ollama/LM Studio models that emit extra fields like thinking, or that require very long time to produce the first token? I’d like to use gpt-oss:20b locally on Windows 11 without breaking the general functionality of the agent, and still keep streaming enabled for other models. Thanks in advance for any guidance or a simpler solution!

1-1 of 1

Level 1 - Level One Agent

5points to level up

Kelvin Leonard Mendoza Ceballo

@kelvin-leonard-mendoza-ceballo-3045

new to programming, excited to learn new things especially about artificial intelligence.

Active 13d ago

Joined Dec 4, 2024

Contributions

Followers

Following