Find answers from the community

Updated 5 months ago

am I doing something wrong when it comes

am I doing something wrong when it comes to indexing ?
I'm reading my markdown files using simpledirectoryreader and trying to create a vectordb with qdrant that would support hybrid search, but it's taking so much longer than I thought to index all the documents, with chromadb it took nearly 2 hours for all the documents (understandable, I have alot of documents)
but with qdrant it's nearly 7 hours now :)))))

I launched the qdrant client using docker as described in the documentation, and this is my code :
Plain Text
client = qdrant_client.QdrantClient(
    host="localhost",
    port=6333,
    timeout=3000.0
)

aclient = qdrant_client.AsyncQdrantClient(
    host="localhost",
    port=6333,
    timeout=3000.0
)

vector_store = QdrantVectorStore(
    "mydocuments",
    client=client,
    aclient=aclient,
    enable_hybrid=True,
    batch_size=20,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    Vendors_docs,
    storage_context=storage_context,
)
W
L
4 comments
Any specific reason to put batch size as 20?
seems to me that could be the reason for increased time in your case
default is 64, you have reduced it to 20
Attachment
image.png
hybrid runs a local model for sparse embeddings. Even if you have a GPU, it can be quite slow
Add a reply
Sign up and join the conversation on Discord