Back to glossary Technology

Document Chunking

The process of splitting documents into smaller, meaningful segments optimized for AI retrieval and processing in RAG systems.

Why Chunking Matters

Document chunking is the process of dividing documents into smaller, semantically meaningful segments for storage in vector databases and retrieval by AI systems. It is a critical yet often underappreciated step in building retrieval-augmented generation (RAG) pipelines. Chunking quality directly impacts retrieval accuracy: chunks that are too large dilute relevance, while chunks that are too small lose important context. Getting chunking right can improve RAG performance more than upgrading the language model itself.

The fundamental challenge is preserving meaning at the segment level while keeping chunks small enough for precise retrieval and within model context window limits.

Chunking Strategies

Fixed-size chunking splits text at regular character or token intervals — simple but often breaks mid-sentence or mid-concept. Recursive character splitting divides text at natural boundaries (paragraphs, sentences) within size constraints. Semantic chunking uses embedding similarity to group related content, creating chunks that represent coherent ideas. Document-structure-aware chunking respects headings, sections, and formatting to maintain the author's organizational logic.

For structured documents like technical manuals or legal contracts, hierarchy-aware chunking preserves parent-child relationships between sections and subsections, enabling the retrieval system to return context alongside specific details.

Optimization Techniques

Overlap between consecutive chunks ensures that concepts spanning chunk boundaries are not lost — typically 10-20% overlap works well. Metadata enrichment attaches section titles, document source, and page numbers to each chunk for better filtering and attribution. Chunk size should be tuned empirically for your specific use case: test different sizes and measure retrieval quality on representative queries. Consider creating multiple chunk sizes from the same content, using small chunks for precise retrieval and larger chunks for providing context to the language model. Regularly evaluate chunking quality as your document corpus evolves.