Member-only story
Chunking : Strategies, Considerations and Optimization (Part 2 of RAG Series)
Bits, pieces and coherency
This is part 2 of the “Retrieval-Augmented Generation (RAG) — Basics to Advanced Series”. Links to other blogs in the series are at the bottom of this blog. Taking forward from part 1, in this blog we will focus on the “Chunking” component which is relevant for chunking of source content and chunking of query. (highlighted in Blue). Since, fundamentally the concept is similar, we will cover this together.
Chunking Basics
As discussed earlier, the data from the source document (for retrieving information) or query has to be converted into a mathematical form. The first step in this process is breaking down the text into smaller chunks of text. Hereafter called “Chunking”. The idea is once the text is chunked appropriately, every chunked part would be converted into a mathematical form (Embedding — next blog in the series).
Sounds straightforward right, but here comes some challenges — how to go about chunking — should we chunk it by every word, every few words, every sentence, few sentences together etc.? What would give us the best results?
Before we think about the right answer, let’s think about what we are going to use these chunks for? We know in the “Retrieval Engine” these chunks will be compared with the query chunks and the chunks from the document which are contextually closest to the query chunks will be retrieved and sent to LLM.
Given this, below are some considerations we should keep in mind while defining a chunking strategy.
Considerations for chunking
The chunking strategy has a significant impact in terms of overall efficacy on the RAG output. The key element to keep in mind while defining the strategy is that every chunk to the extent possible should be able to encapsulate a “context” or a theme. i.e. from one chunk to another, the context could be different. The following are the key levers one could consider:
- Size of the…