Google is building toward AI that understands text, images, video, audio, and documents all in one system.
That means it can connect meaning across different types of content instead of treating each one separately.
📊 Supports up to 6 images and 120 minutes of video or audio in one query
⚙️ Includes MRL for more efficient embeddings
🚀 A major step for multimodal search and retrieval
This is where AI search is heading.