I wanted to apply what I'm learning in the "Building Applications with LLMs" course without worrying too much about costs, so I started running models locally using Ollama. Since I'm just prototyping, I dont need top-tier models, Ill save those for production (one day). My setup isn't the fastest, so things were a bit slow at first, but learning about quantization really helped me improve performance. Hopefully this helps others in a similar situation.
This video explained it well for me for starters, maybe for you too.