Yan Lu

Hello, team. I am building RAG with my local llm. Following the instructions of https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom.html, I am able to generate the response but the query_engine.query("xxx") takes up a long time. To figure out the cause, I add print to my custom llm class and find that query_engine.query("xxx") will call the complete many times. Thus, I want to know how to make query_engine.query("xxx") call the complete only one-time in the query stage.

Plain Text

documents = SimpleDirectoryReader(input_files=["/mnt/data/model_zoo/BCEmbedding/BCEmbedding/tools/eval_rag/eval_pdfs/Comp_en_llama2.pdf"]).load_data()
index = SummaryIndex.from_documents(documents)

# Query and print response
query_engine = index.as_query_engine()
response = query_engine.query("What is Llama 2?")
print(response)

Find answers from the community

SummaryIndex