Find answers from the community

Updated 4 months ago

I am using similarity top k 10 to query

At a glance

The post discusses the performance of using similarity_top_k=10 to query a vector index, which takes a long time to get the final answer. Community members suggest exploring ways to stream responses after completing the prompt for each selected passage, similar to ChatGPT. They discuss the possibility of manually achieving this by selecting similarity_top_k=1 ten times.

In the comments, community members provide insights and suggestions. They explain that when top_k is 10, the system has to send data to the LLM (up to) 10 times, which is a sequential process. They suggest trying response_mode="tree_summarize" with similarity_top_k=10 to parallelize the LLM calls, and response_mode="compact" to "compact" the prompt with as many retrieved texts as possible.

The community members also discuss the challenge of merging multiple vector index files into a single index, which is an important use case for parallel computing and when documents are uploaded over time. They suggest adding this as a feature request.

Finally, a community member asks a simple question about how to set top_k to 2

Useful resources
I am using similarity_top_k=10 to query vector index. But it takes long time to get final answer based best matches. Is there a way to stream responses after completing prompt for each selected passage like ChatGPT? If query API doesn't provide that, Is it possible to manually achieve it by selecting similarity_top_k=1 ten times ?
1
L
y
j
23 comments
When top_k is 10, it has to send data to the LLM (up to) 10 times, and refining the response as it goes, so it is inherently a sequential process (at least as far as I understand it)

Streaming responses has been hot request lately, I wouldn't doubt its being worked on soon
@Logan M interesting, I actually thought it sends all retrieved paragraphs at once. Can you tell how the actual default LLM-query looks like, for the first paragraph and then for the following ones?
Hopefully @jerryjliu0 can shed some light on this
See the text QA prompt and refine prompts from here (line 88-111) for examples of the queries the vector index uses:

https://github.com/jerryjliu/gpt_index/blob/main/gpt_index/prompts/default_prompts.py
@erajasekar we now have async support, try setting response_mode="tree_summarize" when you have similarity_top_k=10! This allows us to parallelize LLM calls
@jerryjliu0 @Logan M Is there a way to retrieve top_k paragraphs using embeddings and send all retrieved paragraphs at once to the LLM to generate an answer? because it may lose context when sent one by one.
yeah try response_mode="compact"
that will "compact" the prompt with as many retrieved texts as possible
Thanks @jerryjliu0 ! How will they be written in the promt in this case?
Regardless I will give it a try. My current problem is that I have an index file (json) per original document and I couldn't find a way to merge them into one index (unless using composability which creates another layer)
@yoelk are you trying to combine everything into one vector index?
with response_mode="compact", each paragraph will be "stuffed" into the prompt up until the prompt is full
@jerryjliu0 yes, as the indices were created in a parallelized way
i see, that's interesting. We don't have full support for "converting" indices to other indices atm but that's something we can look into
@jerryjliu0 what do you mean by converting? I'm not trying to convert them, just want to have one index.json (SimpleVectorIndex) instead of multiple SimpleVectorIndex files
yeah that's what i meant - i was thinking of that as "conversion"
Got it. I think this is an important use case, not just for parallel computing but also when document files are being uploaded over time so that a new index is created at upload
@jerryjliu0 I'll add that in feature requests
out of curiosity why are you creating a new index for each document vs. adding the document to an existing index?
@jerryjliu0 whenever document files are being uploaded by the user, there's a new instance of an AWS lambda that calculates the index and saves it
@jerryjliu0 So per input text file I create an index file in an async way
@jerryjliu0 Happy to hear your thoughts if you have another way of implementing it in a parallelized way and still have one index file at the end of the process
Hi everyone. Is there a simply way to tell the query method on a GPTSimpleVectorIndex to include top_k=2 because it looks like it's set to top_k=1? I couldn't figure this out from the documentation so apologies if it's a silly question. Cheers
Try adding similarity_top_k=2 to the query call πŸ‘
Add a reply
Sign up and join the conversation on Discord