Find answers from the community

Updated 4 months ago

Query speedup

At a glance
Hi, im using a list index but u need to speed up the processing time. My use case doesn’t allow for me to risk missing important information so a list index is required there a way to case doesn’t allow for me to risk missing important information so a list index is required. Is there a way to maybe with trees summarize send out multiple API calls at once?
L
s
16 comments
I think if you are using response_mode="tree_summarize" already, you can get a decent speedup using async

https://github.com/jerryjliu/llama_index/blob/main/examples/async/AsyncQueryDemo.ipynb


Otherwise, try increasing the chunk size (but only if you previously shrank it)
Some documents are 100+ pages, any recommendations on how to speed it up? I don’t care about the cost of the LLM as much as getting a response faster
@Logan M new to discord don’t know if need to tag your or not haha
Haha I'm on it

A tree index will be faster to query than a list index, but a little slower to build

Use mode="summarize" instead of response_mode for tree indexes though (if you are summarizing)
Great ! The key part is that I can’t miss any information so that limits me to a high k similarity on vector or list/tree. Does a tree miss any information?
Is there no way to split up the Index simultaneously ask the same question across all nodes and then synthesize the response ?
Mmm a tree summarizes a lot of info (at basically builds a bottom up tree of summaries and then uses that to query).

Assuming it does a good job it should work well 🤔
That's a composable index! You could wrap a few list indexes with a top level index

https://gpt-index.readthedocs.io/en/latest/how_to/index_structs/composability.html
here are my configurations for tree i'll give it a try any recomendations?

{"query_str": user_question,
"mode": "embedding",
"service_context": service_context,
"verbose": True}

def initialize_service_context():
max_input_size = 8000
num_output = 1500
max_chunk_overlap = 20

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4", request_timeout=1500))

return ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
Hmm looks pretty good to me! I think the mode could be either embedding or summarize, I'm not sure what will work best for you 😅
weird i tried those settings and it only returned the prompt empty...
That's... strange 🤔
its happened many times before it shows all the nodes but it doesn't include them in the response i'm running on default now will post back
it usually doesn't happen on default
Hmm, I need to play more with the tree index lol
Add a reply
Sign up and join the conversation on Discord