Question:
I’m looking for help achieving high-accuracy OCR from a complex vector PDF.
The document is 30+ pages long, contains multiple entities with several tables per entity, and some entities span across multiple pages. Each page also has a repeating header, which complicates parsing.
I’ve tried several approaches, but the extraction is not accurate enough and misses data. I need 100% accuracy.
Any guidance on reliable tools or pipelines would be appreciated.