This workflow creates a powerful REST API endpoint that extracts text from images using Google's Gemini AI. The workflow begins with a webhook node that listens for POST requests at the /extract-text endpoint. When an image file is uploaded through this endpoint, the workflow converts the binary image data to base64 format using the "Extract from File" node. This base64-encoded image is then sent to Google's Gemini 1.5 Flash API with a prompt requesting text extraction. The extracted raw text is processed through an AI Agent powered by Gemini, which intelligently aligns and formats the text in the correct reading order. The final formatted text is returned as the webhook response.
Key Components:
- Webhook trigger for API endpoint creation
- Binary to base64 conversion for image processing
- Google Gemini API integration for OCR capabilities
- AI Agent for intelligent text formatting and alignment
- Synchronous response handling
Use Cases:
- Document digitization services
- Receipt and invoice processing
- Business card scanning applications
- Extracting text from screenshots
- Converting printed materials to digital format
- Accessibility tools for visually impaired users
- Automated data entry from physical documents