My first impressions on Moondream (released yesterday)
I've just spent an hour playing with Moondream, a new tiny vision LM, and it's really impressive how something so small can work so fine and fast. I've ran it on a number of random images and it correctly identified them each time, even managed to recognize in one photo that there's an avocado in the dish which I didn't notice at all: The full answer on that photo (see the photo below):
```
> What's in the picture, please describe
The image features a bowl of food, containing rice and a mixture of vegetables, placed on a table. The bowl is filled with a variety of ingredients, including rice, avocado, and possibly other vegetables. There are also two pieces of chicken in the bowl, adding to the meal's overall composition. A fork is resting on the table next to the bowl, ready to be used to enjoy the delicious dish.
```
I'd say it's pretty darn impressive for such a tiny model. Sky is the limit what one can do with this...
The full description from the README: `1.6B parameter model built using SigLIP, Phi-1.5 and the LLaVA training dataset. Weights are licensed under CC-BY-SA due to using the LLaVA dataset.`
Frankly, I'm pretty new to all of this, so I have no idea what like 30% of the words above mean 😃, but I had no problems installing it locally on my M1 mac from Github and running it interactively (just follow the README, works on the first run, all you need is python installed on your machine). You can also try it on Hugging face which is even easier.
Check it out, it's great fun!
7
1 comment
Ivan Dilber
3
My first impressions on Moondream (released yesterday)
Data Alchemy
skool.com/data-alchemy
Your Community to Master the Fundamentals of Working with Data and AI — by Datalumina®
Leaderboard (30-day)
Powered by