Help with data parsing
Hi guys! I am in a doozie, got about 700 pages of text I am trying to parse and extract knowledge from. Right now:
  • Corpus: hundreds of messy, multi-topic text pages where useful knowledge is scattered throughout and topics drift. Pages are grouped into "review packages".
  • Packaging: each is turned into a compact "bundle" + manifest, then handed to an LLM extraction worker that reads only that bundle (no raw DB access)
  • Extraction: the worker writes a standardized "lean note" — a TL;DR
  • Compilation: notes are merged into long-lived, topic-based knowledge files ("epics") via targeted inserts, with a processed-index ledger for idempotency and capped epic fan-out so facts don't get duplicated
TLDR I got 700 pages of text I want summarized, and getting through 10% of it took two session limits of my Codex pro plan. Any ideas, tools to help? Thanks fam!
2
2 comments
Novus Vella
2
Help with data parsing
Clief Notes
skool.com/cliefnotes
Jake Van Clief, giving you the Cliff notes on the new AI age.
Leaderboard (30-day)
Powered by