I'm building an automotive damage assessment chatbot workflow in n8n that collects customer information through conversation. I'm facing a challenge with data extraction and storage strategy.
## Current Setup
- **n8n workflow** with AI Agent (Google Gemini)
- **PostgreSQL database** (`claim_sessions` table with ~50 columns)
- **Two data sources:**
1. **OCR extraction** from uploaded documents (ID, vehicle registration, police reports) - field names are known and consistent
2. **Chat conversations** where AI collects data like name, address, accident details, etc. - field names are unpredictable
## The Problem
For OCR data, I can easily normalize field names since I control the OCR prompt. But for chat-collected data, the AI agent might name fields arbitrarily:
- User: "My name is Max Mustermann"
- AI might store as: `{name: "Max Mustermann"}` or `{vorname: "Max", nachname: "Mustermann"}` or `{customer_name: "..."}`
I need consistent field names matching my database schema (e.g., `vorname`, `nachname`, `strasse_nr`, `plz`, `kennzeichen`, etc.)
## Options I'm Considering
### Option 1: Structured Output Parser
- Add Output Parser node to AI Agent
- Force AI to output JSON with exact schema every response
- **Pros:** Guaranteed structure, direct DB insert
- **Cons:** Verbose (AI outputs full schema every message), complex setup, might limit natural conversation
### Option 2: Post-Processing Data Extraction
- AI Agent generates natural conversation
- After each response, run separate AI call to extract structured data from the conversation
- Use Claude/Gemini API with prompt: "Extract all mentioned data into this exact JSON schema: {vorname, nachname, ...}"
- **Pros:** Natural conversation, flexible
- **Cons:** Extra API call per message, costs more tokens, slight latency
### Option 3: Hybrid Approach
- OCR data → Automatic normalization & DB save (works great)
- Chat data → Keep in Chat Memory only
- Final step → AI summarizes all collected info, user confirms, then save to DB
- **Pros:** Simple, works immediately, fewer API calls
- **Cons:** Data only persists after full conversation completion
### Option 4: In-Prompt Schema Enforcement
- Add detailed field naming instructions to AI system prompt
- List all 50+ database fields with descriptions
- Hope AI follows naming convention consistently
- **Pros:** No extra nodes/API calls
- **Cons:** Unreliable, AI might still use different names, hard to validate
## Current Workflow Structure
```
Webhook → Extract Input → Detect Support/Claim Agent
→ [IF has files] → OCR Processing → Normalize OCR Data → Save to DB
→ AI Agent (with Chat Memory) → Send Response
```
## Questions
1. Which approach is most reliable for production use?
2. Is there a better n8n-native solution I'm missing?
3. How do others handle incremental data collection in conversational AI workflows?
4. Would a combination approach work better (e.g., OCR = auto-save, Chat = manual confirmation)?
## Constraints
- Using n8n Cloud (latest version)
- PostgreSQL + Supabase for vector store
- Google Gemini 2.0 Flash as main LLM
- Conversations can be 20-50+ messages long
- Need to support chat history compression (summarize every 20 messages)
Any advice from those who've built similar conversational data collection workflows?