Find answers from the community

Updated 2 weeks ago

Hybrid

Hello everyone,

I'm looking to improve the context retrieval performance of my RAG system. Currently, we're using Qdrant with approximately 100k vectors. I experimented with Qdrant's hybrid search following this documentation( https://docs.llamaindex.ai/en/stable/examples/vector_stores/qdrant_hybrid/). While I really liked the improved results from the hybrid search, the response time increased dramatically from around 700ms to between 8 and 11 seconds, which is impractical for my application.

Does anyone have suggestions on how to optimize this response time?
L
1 comment
Yea, it's running a model locally, which without CUDA will be very slow

Alternatively, you can configure a bm25 approach, which should be just as good

Plain Text
vector_store = QdrantVectorStore(..., fastembed_sparse_model="Qdrant/bm25")
Add a reply
Sign up and join the conversation on Discord