clean pdf. readable text. looked totally normal. but wrong amounts every time.
THE PROBLEM
vendor puts discount on a separate line below the total. extraction grabbed that number instead of actual total.
every invoice from this vendor: wrong by exactly $47.50
was about to give up. asked in another community and someone suggested trying pdf vector with json schema to define exactly which field means what.
THE FIX
switched to structured extraction. told it "total_amount" = final amount due, not subtotals or discounts.
honestly didnt expect it to work but now handles 12 different vendor layouts without confusion
still learning how to set up schemas properly but way better than my original approach
lesson: extraction without context = garbage
ever had extraction grab the wrong field?