Client called at 11 PM: "Workflow takes 45 SECONDS per PDF. We have 10,000 due Monday." Here's how I optimized it to 3 seconds
Client called at 11 PM: "The workflow is taking 45 SECONDS per PDF. We have 10,000 to process by Monday." Here's how I optimized it to 3 seconds (and saved the weekend): THE EMERGENCY: Law firm acquisition - needed to process 10,000 contracts by Monday morning for due diligence. Current speed: 45 seconds each = 125 hours = impossible THE DIAGNOSIS (1 hour of profiling): Where time was wasted: - PDF download: 2 seconds ✓ (fine) - Generic text extraction: 38 seconds ❌ (disaster) - Table parsing: 4 seconds ⚠️ (could improve) - Database write: 1 second ✓ (fine) The killer: Using basic extraction on 300-page contracts when we only needed 5 pages. THE SURGERY (3 AM - 5 AM): Fix #1: Intelligent Page Detection - Preview first page, find table of contents - Jump directly to signature pages and payment terms - Skip the 280 pages of boilerplate - Time saved: 35 seconds Fix #2: Parallel Processing - Split into 10 parallel n8n workflows - Each handles 1,000 documents - Load balanced across CPU cores - Time saved: 80% overall Fix #3: Better Extraction - Switched from generic to PDF Vector's LLM-enhanced mode - Understands context, not just text patterns - Handles their weird merged cells perfectly - Accuracy actually IMPROVED to 99.4% Fix #4: Caching - MD5 hash of each document - Skip if already processed - 1,100 were duplicates - Time saved: 55 minutes THE RESULT: Saturday 6 AM test run: - 10,000 contracts - Total time: 8 hours 23 minutes - Average per document: 3 seconds - Accuracy: 99.4% - Client reaction: "You're a f***ing wizard" Sunday: I slept for 14 hours THE TOOLS: - n8n with SplitInBatches for parallel processing - Redis for caching - PDF Vector API (handles huge docs without memory issues) - 1 very large coffee Lesson learned: Most "slow" workflows are doing unnecessary work. Find what you actually need and skip everything else. What's your worst performance nightmare? Let's optimize it together ⚡