Chunking for RAG: What Actually Works in Production
While doing a lot of Upwork applications for my business outreach, I keep seeing the same question come up again and again in RAG-related projects: “How do you handle chunking properly?” So I spent time designing a clean, production-ready chunking strategy in n8n, and this is the approach I now use. The goal: Give the LLM chunks that are meaningful, well-sized, and aware of where they come from in the document. Here’s the 3-step strategy. 1. Smart Markdown chunking (content-first) Instead of splitting text by character count: • Split by Markdown headings first (#, ##, ###) • Recursively split large sections using paragraphs, code blocks, sentences • Merge tiny chunks to avoid low-signal embeddings • Keep chunk sizes stable for embeddings Result: chunks that still make sense when read alone. 2. Extract document hierarchy (structure-first) Chunking alone loses structure, so I separately extract: • Section titles • Heading levels • Parent → child relationships • Full section paths (e.g. Docs > API > Auth) Then I map each section back to the chunks it spans. Result: I know exactly which chunks belong to which section. 3. Enrich chunks with section context (retrieval-first) Finally, I merge both worlds: • Each chunk keeps its text • Plus metadata like: • Section range • Parent section range • Page numbers (if PDF-based) Result: every chunk “knows” its place in the document. Why this matters for RAG • Better retrieval accuracy • Easier citations (“this answer comes from section X”) • Ability to retrieve full sections, not just isolated chunks • Cleaner UX for chatbots and agents Mental model Markdown → Smart chunks → Hierarchy → Hierarchy-aware chunks → Vector DB Since applying this, retrieval quality has been far more stable on long docs, knowledge bases, and websites. Curious how others here approach chunking for RAG. Do you keep it simple, or have you already hit the limits of naive splitting?