Qdrant

At a glance

The community member is running a code that adds 2500 documents (NOTES) to a Qdrant vector store, but the resulting vectors_count is only around 1300. The community member expected the vectors_count to be at least 2500+ based on the chunking. The comments suggest that some of the documents were empty, which explains why the vectors_count is lower than expected. The community members confirm that the pipeline skips empty documents to avoid issues with the embedding model.

cchantlong

I'm not sure if this is the right place to ask but when running this code, assuming my NOTES (documents) have a length of 2500, after adding it to Qdrant, and looking at the vectors_count, it is around 1300. I would assume if I add 2500 docs, based on the chunking, I would have at least 2500+ vectors_count?

Plain Text

client = qdrant_client.QdrantClient(url="http://localhost:6333")
vector_store = QdrantVectorStore(client=client, collection_name="NOTES")

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=512, chunk_overlap=0),
        HuggingFaceEmbedding(model_name='XXXXX'),
    ],
    vector_store=vector_store,
)

pipeline.run(documents=NOTES)

3 comments

WWhiteFang_Jr

True, it should atleast show 2500 count 👀

cchantlong

Found out some of the docs were empty Text which can explain part of the reason.

LLogan M

Yes! It skips empty docs, since otherwise the embedding model kind of explodes lol

Add a reply

Find answers from the community

Qdrant