What is the trade off space between

At a glance

The community members are discussing the trade-off between chunk size and LLM tokens when optimizing search queries. They note that reducing chunk size requires increasing the similarity_top_k parameter, but this can be suboptimal. Larger documents compound this issue, leading to reduced answer quality. Suggestions include injecting document metadata, searching the document directly for more context, and using a more advanced search algorithm. The community members also discuss the potential of using a List index with a simple vector index on top for better performance, but note that List index queries may be O(N) which could be problematic.

Useful resources

yyourbuddyconner

What is the trade-off space between chunk size and LLM tokens?

I have been playing around with optimizing this, and there seems to be a floor of query performance along chunk size depending on document size. Increasing chunk size increases LLM tokens sent for query response however.

I am thinking of parameterizing chunk size to be functional with document size and optimize search queries based on that but would appreciate general thoughts to vet the concept.

14 comments

jjerryjliu0

if you're using the GPTSimpleVectorIndex, if you reduce chunk size, make sure to increase similarity_top_k!

jjerryjliu0

because by default we only fetch one chunk (so if chunks are super small they won't be informative)

yyourbuddyconner

Yeah thats exactly what I am doing to reasonable effect

yyourbuddyconner

but if I cut chunks in half I have to more than double top_k which is suboptimal

yyourbuddyconner

(based on my anecdotal tests)

jjerryjliu0

gotcha

yyourbuddyconner

This compounds on especially large documents, and answer quality suffers

jjerryjliu0

if you want smaller text chunks, but to inject document metadata, you can set the extra_info property in the Document object. This metadata will be injected into every text chunk

yyourbuddyconner

Yeah and I could do a top_k and then search the document directly to fetch more context maybe

jjerryjliu0

yep! also possible

yyourbuddyconner

What I am converging to is "I need a more advanced search algo that is less greedy"

jjerryjliu0

You can also try defining a List index for each document, and then defining a simple vector index on top of the subindices through composability. https://gpt-index.readthedocs.io/en/latest/how_to/composability.html. then when you retrieve a top-k "chunk" it'll route the query to the underlying list index which will synthesize over the entire document

yyourbuddyconner

Arent list index queries O(N) though?

yyourbuddyconner

If so I think I need to find a way that makes this sub-linear somehow, because I think directionally youre right, but I dont want to have to batch queries ahead of time (though that is an interesting concept for really good answers)

I could get the most asked questions and pre-render them.

Add a reply

Find answers from the community

What is the trade off space between