Hello, team. I am building RAG with my local llm. Following the instructions of
https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom.html, I am able to generate the response but the query_engine.query("xxx") takes up a long time. To figure out the cause, I add
print
to my custom llm class and find that
query_engine.query("xxx")
will call the
complete
many times. Thus, I want to know how to make
query_engine.query("xxx")
call the
complete
only one-time in the query stage.
documents = SimpleDirectoryReader(input_files=["/mnt/data/model_zoo/BCEmbedding/BCEmbedding/tools/eval_rag/eval_pdfs/Comp_en_llama2.pdf"]).load_data()
index = SummaryIndex.from_documents(documents)
# Query and print response
query_engine = index.as_query_engine()
response = query_engine.query("What is Llama 2?")
print(response)