client sent 20 test invoices. my extraction got 6 right. SIX.
tried 3 different approaches:
- built-in pdf node: worked on 4 invoices
- regex patterns: broke on every layout change
- ocr + manual parsing: took 2 hours per invoice format
was about to refund the client
then someone here mentioned pdf vector. tested it expecting nothing.
18 out of 20 extracted correctly first try. the llm mode just figured out different layouts automatically.
same invoices. same workflow. different extraction tool. completely different results.
client has no idea how close i came to quitting lol
what tool swap saved your project?