Some additional documentation in case you're venturing into the world of tool integrations.
As I was gearing up for a massive inference for a proof of concept, I noticed something fishy about my Langfuse output... I wasn't getting latency, time to first token, or the max tokens logged. Without thinking about it, when i sat down yesterday morning, I saw an update in the langfuse repository and instantly upgraded my local installation.
After over 5 hours of going through the Litellm AND Langfuse codebase, i finally decided to do the reasonable thing and try and figure out which versions were working together.
I landed on ANY version of LiteLlm with Langfuse <= 2.21.*
I got lucky here, I didn't upgrade my main environment, i.e. the environment that wasn't virtual. So I was able to pip freeze and catch the working version. The lesson learned here is that tracking which versions of packages you're using before you make any upgrades is absolutely crucial! Also, it taught me, again, that although libraries make some things easier, it could end up having you spend 5+ hours because you don't understand the codebase of the tools you're using.
Anyhow, here is the output from my inference on 5 job descriptions that I scraped from Indeed using 'Ai Engineer' as the query. Notice latency isn't tracked... But look at how cheap Haiku is! :D Happy building folks!