When top_k is 10, it has to send data to the LLM (up to) 10 times, and refining the response as it goes, so it is inherently a sequential process (at least as far as I understand it)
Streaming responses has been hot request lately, I wouldn't doubt its being worked on soon
@Logan M interesting, I actually thought it sends all retrieved paragraphs at once. Can you tell how the actual default LLM-query looks like, for the first paragraph and then for the following ones?
Hopefully @jerryjliu0 can shed some light on this
@erajasekar we now have async support, try setting response_mode="tree_summarize"
when you have similarity_top_k=10
! This allows us to parallelize LLM calls
@jerryjliu0 @Logan M Is there a way to retrieve top_k paragraphs using embeddings and send all retrieved paragraphs at once to the LLM to generate an answer? because it may lose context when sent one by one.
yeah try response_mode="compact"
that will "compact" the prompt with as many retrieved texts as possible
Thanks @jerryjliu0 ! How will they be written in the promt in this case?
Regardless I will give it a try. My current problem is that I have an index file (json) per original document and I couldn't find a way to merge them into one index (unless using composability which creates another layer)
@yoelk are you trying to combine everything into one vector index?
with response_mode="compact", each paragraph will be "stuffed" into the prompt up until the prompt is full
@jerryjliu0 yes, as the indices were created in a parallelized way
i see, that's interesting. We don't have full support for "converting" indices to other indices atm but that's something we can look into
@jerryjliu0 what do you mean by converting? I'm not trying to convert them, just want to have one index.json (SimpleVectorIndex) instead of multiple SimpleVectorIndex files
yeah that's what i meant - i was thinking of that as "conversion"
Got it. I think this is an important use case, not just for parallel computing but also when document files are being uploaded over time so that a new index is created at upload
@jerryjliu0 I'll add that in feature requests
out of curiosity why are you creating a new index for each document vs. adding the document to an existing index?
@jerryjliu0 whenever document files are being uploaded by the user, there's a new instance of an AWS lambda that calculates the index and saves it
@jerryjliu0 So per input text file I create an index file in an async way
@jerryjliu0 Happy to hear your thoughts if you have another way of implementing it in a parallelized way and still have one index file at the end of the process
Hi everyone. Is there a simply way to tell the query method on a GPTSimpleVectorIndex to include top_k=2 because it looks like it's set to top_k=1? I couldn't figure this out from the documentation so apologies if it's a silly question. Cheers
Try adding similarity_top_k=2
to the query call π