Find answers from the community

Updated 3 months ago

hi @Logan M , is there a way that we can

hi , is there a way that we can specify the usage of gpus in query engine for semantic search in llama index?
G
L
2 comments
Plain Text
    
query_engine = index.as_query_engine(
        text_qa_template=qa_prompt_tmpl,
        similarity_top_k=rag_nodes_top_k,
        response_synthesizer = response_synthesizer,
        verbose=True
        )

Does it do parallel calculation under the hood with the vectors in vector store? Appreciate if you could give any suggestions to speed up the semantic search
Retrieval is definitely not the bottleneck. The slow part is actually generating the answer with the llm

Not sure what your top k or chunk size is, but both of those parameters will increase response times the bigger they get
Add a reply
Sign up and join the conversation on Discord