Sort of a general RAG question (using llama-index) to anyone. If you have some sample text data:
I have a corpus of documents that I have broken down into chunks. Each chunk is about 20 sentences long. I also chunked these documents with a sliding window to maintain context. I used the openai embeddings model to create a vector for each chunk of text. Currently, when the user submits a query, the app will embed this query, perform a semantic search against the vector database, then provide the gpt model the top 10 chunks of text with the user query, gpt then provides an answer to the query.
You can use embeddings w/ llamaindex's tools to
semantically split this into different chunks.
Let's say that returns you 3 chunks:
I have a corpus of documents that I have broken down into chunks. Each chunk is about 20 sentences long. I also chunked these documents with a sliding window to maintain context.
I used the openai embeddings model to create a vector for each chunk of text.
Currently, when the user submits a query, the app will embed this query, perform a semantic search against the vector database, then provide the gpt model the top 10 chunks of text with the user query, gpt then provides an answer to the query.
What I'm wondering is basically, what does the tradeoff look like for these smaller semantic chunks as opposed to a large chunk.
In my head, if you do that initial paragraph as 1 vector, vs. 3 vectors (1 of each chunk), your retrieval ability
should be higher with the second approach. Each vector, to me, will be less 'diluted' in terms of info. But what happens when information in 1 semantic unit is dependent on the previous. For example, if chunk 2 only makes sense after reading chunk 1. Are you SOL?
I guess I can't seem to (neither mathametically or logically) figure out what that tradeoff looks like in terms of IR accuracy.