An Idea (by Ingenious Piece)

No Matter What People Tell You, Words And Ideas Can Change The World.

Follow publication

Chunking : Strategies, Considerations and Optimization (Part 2 of RAG Series)

Bits, pieces and coherency

Chandan Durgia
An Idea (by Ingenious Piece)
6 min readJan 25, 2024

--

Photo by Markus Spiske on Unsplash

This is part 2 of the “Retrieval-Augmented Generation (RAG) — Basics to Advanced Series”. Links to other blogs in the series are at the bottom of this blog. Taking forward from part 1, in this blog we will focus on the “Chunking” component which is relevant for chunking of source content and chunking of query. (highlighted in Blue). Since, fundamentally the concept is similar, we will cover this together.

Image by Author: RAG Architecture

Chunking Basics

As discussed earlier, the data from the source document (for retrieving information) or query has to be converted into a mathematical form. The first step in this process is breaking down the text into smaller chunks of text. Hereafter called “Chunking”. The idea is once the text is chunked appropriately, every chunked part would be converted into a mathematical form (Embedding — next blog in the series).

Sounds straightforward right, but here comes some challenges — how to go about chunking — should we chunk it by every word, every few words, every sentence, few sentences together etc.? What would give us the best results?

Before we think about the right answer, let’s think about what we are going to use these chunks for? We know in the “Retrieval Engine” these chunks will be compared with the query chunks and the chunks from the document which are contextually closest to the query chunks will be retrieved and sent to LLM.

Given this, below are some considerations we should keep in mind while defining a chunking strategy.

Considerations for chunking

The chunking strategy has a significant impact in terms of overall efficacy on the RAG output. The key element to keep in mind while defining the strategy is that every chunk to the extent possible should be able to encapsulate a “context” or a theme. i.e. from one chunk to another, the context could be different. The following are the key levers one could consider:

  1. Size of the

--

--

An Idea (by Ingenious Piece)
An Idea (by Ingenious Piece)

Published in An Idea (by Ingenious Piece)

No Matter What People Tell You, Words And Ideas Can Change The World.

No responses yet

Write a response