RAG chunking best practices
Chunking splits documents into pieces that fit embedding models and LLM context windows. Poor chunking is one of the top reasons RAG systems retrieve irrelevant text or miss the right passage. Use these practices to improve recall and answer quality.
Match chunk size to your embedding model
Each embedding model has a token limit. Chunks that are too long get truncated; chunks that are too small lose surrounding context. Start near the model’s sweet spot (often a few hundred to ~1–2k tokens) and adjust with retrieval metrics.
Use overlap between chunks
Small overlap (e.g. 10–20% of chunk length) helps when a fact spans two boundaries. Too much overlap duplicates content in the index and can dilute ranking—tune for your corpus.
Split on structure, not only character count
Prefer boundaries at headings, paragraphs, and list items. For tables, keep table rows or the full table in one chunk when possible so numeric answers stay coherent. WeKnora’s document understanding focuses on structure-aware parsing; see Document AI.
Enrich metadata
Store document id, section title, page number, and access tags with each chunk. Metadata enables filtering (e.g. by product line) and better citations in chat-with-documents experiences.
Test with real user questions
Build a small evaluation set of questions and gold passages. If many misses occur at chunk boundaries, increase overlap or change split strategy. Pair chunking changes with embedding and retrieval tuning.