Feature Release: Chat History Token Optimization

Nov '25 (edited) • 🏰 Homebase

So, when using your own openai key (and even us as a business), you notice with agent stack (tools, prompt, convo history, RAG, etc) it starts to stack up quick - especially if you have a really involved process.

We implemented a token optimization model before our chat completions to ensure you get the cost savings and ill share some data at the end :)

So, we are now truncating and summarizing conversation history - we noticed there are large chat completeions coming through with 300-400+ message histories. This becomes expensive overtime if its a lead you've been working or following up with for a while engaging in conversation, so we are reducing that number and summarizing the history to ensure the intelligence stays the same but the token consumption goes way down (98% decrease on larger runs)

Another thing we are doing is truncating large tool call outputs within the window that are not relevant to the current task - meaning, if there are tool calls with large outputs (like get_availability), if they are not relevant to the current task at hand, we truncate the response to show the agent that the action happened but the context is shorter. This saw a huge reduction in token consuption as well (96% decrease on larger runs)

Here is the before and after, this is the same exact conversation history, assistant ID, tools, custom fields, knowledge base, etc - but see the speed and cost difference and the output was the exact same message:

Differences: