What is the trade-off space between chunk size and LLM tokens?
I have been playing around with optimizing this, and there seems to be a floor of query performance along chunk size depending on document size. Increasing chunk size increases LLM tokens sent for query response however.
I am thinking of parameterizing chunk size to be functional with document size and optimize search queries based on that but would appreciate general thoughts to vet the concept.
if you want smaller text chunks, but to inject document metadata, you can set the extra_info property in the Document object. This metadata will be injected into every text chunk
You can also try defining a List index for each document, and then defining a simple vector index on top of the subindices through composability. https://gpt-index.readthedocs.io/en/latest/how_to/composability.html. then when you retrieve a top-k "chunk" it'll route the query to the underlying list index which will synthesize over the entire document
If so I think I need to find a way that makes this sub-linear somehow, because I think directionally youre right, but I dont want to have to batch queries ahead of time (though that is an interesting concept for really good answers)
I could get the most asked questions and pre-render them.