Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team.
Unlock the Power of LLM Evaluation with Lighteval 🚀
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether it's transformers, tgi, vllm, or nanotron—with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up. Customization at your fingertips: letting you either browse all our existing tasks and metrics or effortlessly create your own, tailored to your needs. Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.