Find answers from the community

Updated 3 months ago

SummaryIndex

Hello, team. I am building RAG with my local llm. Following the instructions of https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom.html, I am able to generate the response but the query_engine.query("xxx") takes up a long time. To figure out the cause, I add print to my custom llm class and find that query_engine.query("xxx") will call the complete many times. Thus, I want to know how to make query_engine.query("xxx") call the complete only one-time in the query stage.
Plain Text
documents = SimpleDirectoryReader(input_files=["/mnt/data/model_zoo/BCEmbedding/BCEmbedding/tools/eval_rag/eval_pdfs/Comp_en_llama2.pdf"]).load_data()
index = SummaryIndex.from_documents(documents)

# Query and print response
query_engine = index.as_query_engine()
response = query_engine.query("What is Llama 2?")
print(response)
Attachment
image.png
L
Y
6 comments
A summary index will send every node to the LLM over multiple LLM calls
This is most useful for queries that require all data
Like summaries
You probably meant to use a vector index maybe?
thanks for your help. I hope to set the call number to 1. Can I implement this with vector index?
A vector index returns only the top 2 nodes (by default). And this typically requires only one LLM call for most llms
Add a reply
Sign up and join the conversation on Discord