What is the trade-off space between chunk size and LLM tokens?
I have been playing around with optimizing this, and there seems to be a floor of query performance along chunk size depending on document size. Increasing chunk size increases LLM tokens sent for query response however.
I am thinking of parameterizing chunk size to be functional with document size and optimize search queries based on that but would appreciate general thoughts to vet the concept.
@jerryjliu0 have you thought at all before about caching queries?
Have a cool PoC for semantic query caching via pinecone (could use the vector index instead) rn and I feel like there might be a place in gpt_index to maybe slot this in as opposed to shipping an external library.
Is the recommended approach for using Pinecone (or another vector store) to load the documents into the store with gpt_index as the interface? (i.e. fresh index, create documents, insert into GPTPineconeIndex)
How would one interface with gpt_index in the case of a pre-existing vector index in pinecone in this case?
Trying to decide if the flexibility of being able to interface directly is better than using the gpt_index abstraction or not for non-full-document Q/A (ex. storing previous queries for a cache of Q/A so LLM calls can be limited)
Any art on how to properly summarize a SimpleVectorIndex? I am seeing that it picks out subsets of my document and mode="tree_summarize" doesnt seem to be a thing on this index type.
Goal is to summarize the index such that it can be placed in a TreeIndex for hierarchical organization. And then facilitate vector querying at query-time for efficient retrieval.