ICM and Binary Files · Clief Notes

ICM and Binary Files

To optimize version control, we avoid tracking binary files like Docx or Excel. Instead, the repository stores a Python generator script and Markdown files containing the specific parameters for each document. The script compiles the Markdown into the final binary artifact, ensuring minimal repository size and semantic diffs.

We apply this pattern to engineering quotation spreadsheets. The LLM creates a MD file after analysing the specific sale. The Python script generates an Excel file from the MD parameters.

However, users must modify the generated Excel directly to adjust the Bill of Materials (e.g., adding rows or changing products) for real-time sensitivity analysis. This human intervention alters the sheet structure, breaking the unidirectional flow and causing standard deterministic reverse-parsing to fail.

The proposed mechanism to handle this is:

1. The user edits the Bill of Materials directly within the generated Excel artifact.

2. A deterministic Python script attempts to parse this modified sheet to update the original Markdown parameters.

3. If structural modifications cause the deterministic script to fail, an LLM agent is triggered as a fallback to probabilistically parse the unstructured sheet and reconstruct the Markdown data.

I am looking for objective feedback on the architectural robustness of this pipeline, specifically regarding latency, reliability, and the viability of using an LLM as a fault-tolerant layer for reverse-parsing human-altered structured data.

3 comments

ICM and Binary Files