Find answers from the community

Updated 6 months ago

hi @Logan M , is there a way that we can

At a glance

GGoat

hi , is there a way that we can specify the usage of gpus in query engine for semantic search in llama index?

2 comments

GGoat

Plain Text

    
query_engine = index.as_query_engine(
        text_qa_template=qa_prompt_tmpl,
        similarity_top_k=rag_nodes_top_k,
        response_synthesizer = response_synthesizer,
        verbose=True
        )

Does it do parallel calculation under the hood with the vectors in vector store? Appreciate if you could give any suggestions to speed up the semantic search

LLogan M

Retrieval is definitely not the bottleneck. The slow part is actually generating the answer with the llm

Not sure what your top k or chunk size is, but both of those parameters will increase response times the bigger they get

Add a reply