Find answers from the community

Updated 11 months ago

@Logan M While using llamaIndex for a

@Logan M While using llamaIndex for a chat engine, is there a way to use one query to retrieve a node and then a different query/prompt to generate the response - in the same call.
L
v
10 comments
for a chat engine, its a little harder. An agent would do the above though (i.e write a query, and then interprets the response)
@Logan M Also I have one more question in which i need your help, currently my retrieve is taking around 3.5-4.5 seconds to retrieve the chunk, how can i reduce this time.
should i change to different types of index or going to pinecone for retrieval process will works for me
Currently i am using it like this :

def build_vector_and_creating_insights_index_store_new(company):
storage_context_map[f"""{company}/html_index"""] = StorageContext.from_defaults(persist_dir=f"""client_data/{company}/html_index""")
index_map[f"""{company}/html_index"""] = load_index_from_storage(storage_context_map[f"""{company}/html_index"""], service_context=get_service_context(company))
curr_index = index_map[f"""{company}/html_index"""]
return curr_index


index = build_vector_and_creating_insights_index_store_new("vwo")
retriever = index.as_retriever(
retriever_mode="llm",
choice_batch_size=1,
)
nodes = retriever.retrieve(req.query)
please suggest me some best optimisations
definitely using something like pinecone, qdrant, weaviate will be faster
sure thank you @Logan M
Add a reply
Sign up and join the conversation on Discord