Find answers from the community

Updated 6 months ago

We are migrating from pgvector to qdrant

We are migrating from pgvector to qdrant.
we have a db of 200k chunks/vectors.
when we uploaded this database to pgvector, it was taking 4 hours for complete embeddings + db upload.
but now we want to migrate to qdrant (retrieve speed is better). but the embedding + upload process is now taking more than 10 days !! we have really searched everywhere and have no idea how to resolve this slowness issue.
L
O
I
21 comments
You have enable_hybrid=True, which by default runs a model locally to generate sparse embeddings -- this will be super slow if running on CPU
turning that off, you will see it run 1000x faster
optionally, if you want sparse embeddings, you can customize how they are generated (maybe you have some external API to call, or some other method that will be faster)
Thank you @Logan M for your answer.
do you know why I'm getting this error now:
UnexpectedResponse: Unexpected Response: 400 (Bad Request)
Raw response content:
b'{"status":{"error":"Wrong input: Not existing vector name error: "},"time":0.018649379}'
Hmm, probably related to switching enable_hybrid=False on the same collection? (Assuming that's what you did)
yes put the enable hybrid on false, i also created a new collection but still get the same error
Hmmm, do you have an outdated version of the vector store? pip install -U llama-index-vector-stores-qdrant
I just re-installed it and still have the same error
Hmm. I can try to replicate, but I'm like 99% sure it will work fine for me πŸ˜…

Just to confirm, what is the exact code you are running? I.e. how do you create the vector store, how are you inserting
I think i saw in your notebook you were creating collections manually, which might be the issue
I'm exactly running the code in this jupyter notebook
When you say that I am creating the collection manually, what can I do instead, please ?
Plain Text
if not client.collection_exists(collection_name=COLLECTION_NAME):
    client.create_collection(
        collection_name=COLLECTION_NAME,
        optimizers_config=models.OptimizersConfigDiff(indexing_threshold=0,),
        hnsw_config=models.HnswConfigDiff(on_disk=True),
        vectors_config={
            "text-dense": models.VectorParams(
                size=3072, 
                distance=models.Distance.COSINE,
            )
        },
        sparse_vectors_config={
            "text-sparse": models.SparseVectorParams(
                index=models.SparseIndexParams()
            )
        },
    )


Remove this code. The vector store handles it for you for new collections
yeaaaaaah it works !! thanks a lot Logan ! I appreciate your help
i had the same issue, do you recommand a snippet of code to enable hybrid search later after uploading all the documents
you cant really enable it after, if you want hybrid, you need to generate the sparse embeddings
but it makes upload process really slow !
is there a way to reduce the latency ?
indeed it does. If you want hybrid search, you should have the hardware to generate the sparse embeddings πŸ‘€

You can completely customize how the sparse embeddings are generated
https://docs.llamaindex.ai/en/stable/examples/vector_stores/qdrant_hybrid/?h=qdrant+hyb#advanced-customizing-hybrid-search-with-qdrant
Add a reply
Sign up and join the conversation on Discord