Find answers from the community

Updated 4 months ago

I am a bit curious, I watched an

At a glance

I am a bit curious, I watched an implementation of small to big retrieval and noticed that they include the same information over various chunk sizes. During retrieval wouldn’t we be getting the same information over various chunks sizes?

12 comments

LLogan M

no, it gets de-duplicated and/or merged

VVenetis

Right, but wouldn't you limit the "different" chunks you are getting, essentially limiting the context? Cause you might end up retrieving kinda the same info (same parent) that is present on both the 256,512 and 1024 chunk.

LLogan M

only the bottom level of chunks are actually retrieved

LLogan M

and then they get merged up if enough children to a parent are retrieved

LLogan M

so for example

top k = 10
10 chunks of 256 tokens are retrieved
chunk1 and chunk2 have the same parent, they get merged
chunk3 and chunk4 have the same parent, they get merged
now we have 2 chunks at 512 tokens, 6 chunks at 256
the 512 chunks have the same parent, they get merged
now we have a single 1024 chunk, and 6 chunks at 256 tokens

VVenetis

Interesting so although we might be creating chunks of smaller sizes eg 128,256,512 (given a parent of 1024), for the retrieval only the smallest ones will be used for our case the 128. I suppose the intermediate ones are used for the merging process you mentioned? Really appreciate the insights you provided

LLogan M

Yea exactly, you got it 👍

VVenetis

Thank you. I do have a followup question thought it is more of an opinion. Would you pair this retrieval approach with a reranker? If yes, I suppose you would do it after the chunks were merged

LLogan M

Yea reranking makes a lot of sense, especially to fitler stuff out because you have to set the initial top-k pretty high for merging to happen nicely

VVenetis

So high topk for retrieving small chunks -> merging -> reranking with small topk. Any rough advices on the initial topk size?

LLogan M

🤷‍♂️ hard to say. Probably 15 or 20 is a good start?

VVenetis

Just wanted to get a rough idea on the size, that would do. Again thank you so much for your help

Add a reply