Help with data parsing · Clief Notes

Help with data parsing

Hi guys! I am in a doozie, got about 700 pages of text I am trying to parse and extract knowledge from. Right now:

Corpus: hundreds of messy, multi-topic text pages where useful knowledge is scattered throughout and topics drift. Pages are grouped into "review packages".
Packaging: each is turned into a compact "bundle" + manifest, then handed to an LLM extraction worker that reads only that bundle (no raw DB access)
Extraction: the worker writes a standardized "lean note" — a TL;DR
Compilation: notes are merged into long-lived, topic-based knowledge files ("epics") via targeted inserts, with a processed-index ledger for idempotency and capped epic fan-out so facts don't get duplicated

TLDR I got 700 pages of text I want summarized, and getting through 10% of it took two session limits of my Codex pro plan. Any ideas, tools to help? Thanks fam!

2 comments