The community members are discussing the trade-off between chunk size and LLM tokens when optimizing search queries. They note that reducing chunk size requires increasing the similarity_top_k parameter, but this can be suboptimal. Larger documents compound this issue, leading to reduced answer quality. Suggestions include injecting document metadata, searching the document directly for more context, and using a more advanced search algorithm. The community members also discuss the potential of using a List index with a simple vector index on top for better performance, but note that List index queries may be O(N) which could be problematic.
What is the trade-off space between chunk size and LLM tokens?
I have been playing around with optimizing this, and there seems to be a floor of query performance along chunk size depending on document size. Increasing chunk size increases LLM tokens sent for query response however.
I am thinking of parameterizing chunk size to be functional with document size and optimize search queries based on that but would appreciate general thoughts to vet the concept.
if you want smaller text chunks, but to inject document metadata, you can set the extra_info property in the Document object. This metadata will be injected into every text chunk
You can also try defining a List index for each document, and then defining a simple vector index on top of the subindices through composability. https://gpt-index.readthedocs.io/en/latest/how_to/composability.html. then when you retrieve a top-k "chunk" it'll route the query to the underlying list index which will synthesize over the entire document
If so I think I need to find a way that makes this sub-linear somehow, because I think directionally youre right, but I dont want to have to batch queries ahead of time (though that is an interesting concept for really good answers)
I could get the most asked questions and pre-render them.