User
Write something
Vision Locally with Llava
Just got implemented the ability to upload an image, so now a multimodal model like Llava can be used to analyze and describe images. For the most part, Llava seems to do pretty well with describing images, and with some prompting can generate output how you like.
0
0
Vision Locally with Llava
Utilizing Vision with Videos????
I stumbled across this article in the openai cookbook that shows you how to use vision with videos: https://cookbook.openai.com/examples/gpt_with_vision_for_video_understanding After reading through it, I learned alot and it seemed like a pretty straight forward way of processing videos through LLMs. Take the video cut it into frames and then pass some of the frames to create some type of commentary or description. Any ideas for what this can be used for? I'm thinking of maybe a live commentator for sports or movie critique 🤔
React App to interface with locally hosted LLMs
Figured I would share this app I have been making, mainly to learn react. I host large language models locally using Ollama. With Ollama, you can download and host open source models for chat completions and generation, as well as opensource embedding models as well. This exposes an API endpoint to your local network and can be called using a similar payload structure as OpenAI. Endpoint: http://127.0.0.1:11434/api/chat Payload: {"model":"llama3.1:latest","messages":[{"role":"user","content":"Hello there!"}],"stream":false} This app is just a Chatbot UI using React, Tailwind CSS, Radix UI components, and a Python Django rest API to handle storing conversations, messages, available models, user data, etc. Aside from being a chatbot, LLMs are also used for other features, like generating a meaningful titles for your conversations based on the messages. There are mutilmodal models that are open source, like llava: llava (ollama.com), which I could use for analyzing images in the future. I also run image generation models locally as well, like Stable Diffusion, which could be used for generating images for users. Code: https://github.com/ethanlchristensen/BS-LLM-WebUI Inspiration: nulzo/Ollama-WebUI: Interact with LLMs, with an emphasis on privacy. (github.com) Tailwind CSS: Tailwind CSS - Rapidly build modern websites without ever leaving your HTML. Radix UI: Radix UI (radix-ui.com) Ollama: Ollama Django Rest:Home - Django REST framework (django-rest-framework.org)
React App to interface with locally hosted LLMs
Enhancing Screen Readers With Vision
What's up everyone, the other day @Andrew MacMaster provided a great use case for utilizing multimodal LLMs. The idea was to implement vision into screen readers which are used by individuals with visual impairment. This way if an image on a webpage does not have a description there's still another way to explain the image rather than completely skipping over crucial information. Using this idea I built out a python script that replicates a screen reader and implements vision using GPT-4o. Frame work: 1. Grab HTML 2. Find img tags 3. Utilize GPT-4o mini to describe the images 4. replace img tags with generated text and put them in as paragraph tags. 5. Send all text to a Text to Speech model Limitations: 1. I'm using just a generic Text to Speech model so its extremely monotone and make you hate your life. 2. The image extraction isn't dynamic so it works for only the one website I was testing it on. Realistically this logic would have to be implemented into a legit screen reader that has the ability to web-scrape properly This was a fun little project that took me less than a day to implement but its quite interesting to see how AI can be used to further the advancements of technology in accessibility. Let me know your thoughts!
Trying something new
What’s up everyone! I’m creating this group to help connect us with like minded individuals in the industry so we can chat, ask questions and help eachother with anything AI related. I want this space to just be open for now and once we have a lot of members we can look into closing it off and making it a more exclusive group. Feel free to post anything that you’re working on outside of work you think is cool. Also, send the invite link around to whomever you think would benefit from this community.
1-5 of 5
AI Think Tank
skool.com/rk-software-services-2370
Let’s build a community of AI Enthusiasts
Leaderboard (30-day)
Powered by