Hi, is there any plan to add multimodal capabilities to Assistable? Specifically, allowing the AI to:
- Interpret images (screenshots, documents, photos)
- Understand audio messages
- Respond using that context
In real-world use cases (sales, support, customer service), users often send voice notes and images instead of text, so this would be a huge upgrade.
Is this on the roadmap, under evaluation, or currently not planned?