Feature Release: Chat History Token Optimization
So, when using your own openai key (and even us as a business), you notice with agent stack (tools, prompt, convo history, RAG, etc) it starts to stack up quick - especially if you have a really involved process.
We implemented a token optimization model before our chat completions to ensure you get the cost savings and ill share some data at the end :)
So, we are now truncating and summarizing conversation history - we noticed there are large chat completeions coming through with 300-400+ message histories. This becomes expensive overtime if its a lead you've been working or following up with for a while engaging in conversation, so we are reducing that number and summarizing the history to ensure the intelligence stays the same but the token consumption goes way down (98% decrease on larger runs)
Another thing we are doing is truncating large tool call outputs within the window that are not relevant to the current task - meaning, if there are tool calls with large outputs (like get_availability), if they are not relevant to the current task at hand, we truncate the response to show the agent that the action happened but the context is shorter. This saw a huge reduction in token consuption as well (96% decrease on larger runs)
Here is the before and after, this is the same exact conversation history, assistant ID, tools, custom fields, knowledge base, etc - but see the speed and cost difference and the output was the exact same message:
Differences:
  • 35 seconds faster
  • 95.95% cheaper
----
Before:
"error_type": null,
"usage_cost": {
"notes": null,
"tokens": {
"output": 211,
"input_total": 175948,
"input_cached": 0,
"input_noncached": 175948
},
"total_cost": 0.353584,
"model_normalized": "gpt-4o",
"models_encountered": [
"gpt-4o"
],
"price_used_per_million": {
"input": 2.5,
"cached_input": 1.25,
"output": 10
},
"error_message": null,
"run_time_seconds": 32.692,
"returned_an_error": false,
After:
"run_time_seconds": 2.618,
"returned_an_error": false,
"error_message": null,
"error_type": null,
"usage_cost": {
"tokens": {
"input_total": 5488,
"input_cached": 0,
"input_noncached": 5488,
"output": 60
},
"total_cost": 0.01432,
"price_used_per_million": {
"input": 2.5,
"cached_input": 1.25,
"output": 10
},
"model_normalized": "gpt-4o",
"models_encountered": [
"gpt-4o"
],
"notes": null
}
15
15 comments
Jorden Williams
8
Feature Release: Chat History Token Optimization
Assistable.ai
skool.com/assistable
We give you the most dominantly unfair advantage in the agency space. The most installed GoHighLevel AI ever.
Leaderboard (30-day)
Powered by