But even though I have an M3 mbpro I can't recommend doing it on your own machine like this. Superslow. The llama3.2 model performed a bit better but still not really practical compared to using openrouter.
Not sure why as I can get pretty good results when running the models inside lm studio. I can run the 14B model in lmstudio just fine while the 7B model inside n8n is like extremely slow.
Asking it a simple question takes 10-15 minutes in n8n with the 7B model
Asking it a simple same question in lmstudio takes 1 minute with the 14B model and I can see it reason, so I don't have to wait at all to see results streaming in.
So why is it so slow in n8n? Is it docker?