Templates work great on clean demo PDFs. Client's production documents: rotated, low-quality scans, mixed formats. Built preprocessing branch that handles real-world mess before main workflow.
THE DEMO vs PRODUCTION GAP:
Template testing: Clean PDFs, perfect orientation, clear text, consistent formatting
Client reality: Phone photos of documents, 90° rotations, coffee stains, mixed Word/PDF/images, varying quality
Template broke immediately on production data.
THE PREPROCESSING SOLUTION:
STAGE 1: QUALITY ASSESSMENT
Check document quality before expensive processing → Flag or fix issues → Route accordingly
STAGE 2: AUTO-CORRECTIONS
- Rotation detection and correction
- Quality enhancement for low-res scans
- Format normalization (all to PDF)
- File size optimization
STAGE 3: PROCESSING BRANCH
Clean documents → Standard workflow
Questionable documents → Enhanced processing
Unreadable documents → Human review queue
THE PREPROCESSING NODES:
NODE 1 - QUALITY CHECKER:
Analyzes document: resolution, orientation, format, readability score
Output: quality_score (0-100), issues_detected array, processing_recommendation
NODE 2 - AUTO-FIXER (IF needed):
Rotation correction: Detects text orientation, rotates to 0°
Enhancement: Increases contrast, sharpens text, removes noise
Normalization: Converts all formats to standard PDF
NODE 3 - ROUTER:
High quality (>80): Standard processing
Medium quality (50-80): Enhanced processing with higher confidence thresholds
Low quality (<50): Human review with original file attached
WHAT THIS HANDLES:
Phone photos, rotated scans, mixed formats, poor lighting, tilted documents, multi-page variations
THE NUMBERS:
Documents monthly: 180
Perfect quality: 95 (53%)
Auto-fixed: 68 (38%)
Human review needed: 17 (9%)
Processing accuracy:
- On demo-quality docs: 98%
- On production docs (without preprocessing): 76%
- On production docs (with preprocessing): 96%
Cost impact:
Preprocessing adds: $0.03 per document
Failed extractions prevented: 40 monthly
Reprocessing cost saved: $120 monthly
Net savings: $115 monthly
CONFIGURATION:
Quality Checker: Image analysis checking DPI, orientation, contrast levels
Auto-Fixer: Image processing tools (rotation, enhancement, normalization)
Router: IF conditions on quality_score field
THE LESSON:
Templates built on perfect test data fail in production. Add preprocessing to handle real-world document mess before main workflow.
THE PATTERN:
Quality check → Auto-fix common issues → Route by quality → Process with appropriate confidence thresholds
TEMPLATE:
Production-ready document preprocessing system. Quality assessment, auto-correction, intelligent routing, configurable thresholds.
How do your workflows handle messy real-world documents?