Find answers from the community

Updated 3 months ago

Error setting up auto index retriever with QdrantVectorStore

Hey guys, wondering if any of ya'll have fixed this error. I keep getting the following when trying to set up and use an auto index retriever: b'{"status":{"error":"Wrong input: Vector params for are not specified in config"},"time":0.000074995}'

I've tried both adding the dense_config to QdrantVectorStore() as well as adding embed_model to VectorStoreIndex() but neither have worked to resolve the issue. I also tried what was suggested above (https://discord.com/channels/1059199217496772688/1059200010622873741/1233183387884191774 - upgrading the python library) but that didn't work either. Here's the code for reference:

Plain Text
_SS_VECTOR_STORE = QdrantVectorStore(client=_QDRANT_CLIENT, collection_name="STAGE", dense_config=models.VectorParams(size=384, distance=models.Distance.COSINE))
_SS_STORAGE_CONTEXT = StorageContext.from_defaults(vector_store=_SS_VECTOR_STORE)
_SS_RECURSIVE_INDEX = VectorStoreIndex.from_vector_store(vector_store=_SS_VECTOR_STORE, storage_context=_SS_STORAGE_CONTEXT, embed_model=FastEmbedEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2"))

_RETRIEVER = VectorIndexAutoRetriever(
    _SS_RECURSIVE_INDEX, vector_store_info=_VECTOR_STORE_INFO, similarity_top_k=20
)
L
C
11 comments
I think this has to do with the default dense vector name being '' (an empty string)

I'm pretty sure if you just remove the dense_config param, it will work
Still running into the error after removing the dense_config param unfortunately :/

But that's a good point about the blank vector name! I'll keep digging
A separate question maybe is, do you know what may be different between me uploading points to qdrant using llamaindex vs me uploading points through the typical qdrant syntax?

We're basically trying to swithc to a different qdrant collection - one that uses both dense and sparse vectors, and using the retriever on that. But that might be what causes the trouble?

I'm realizing this because I'm querying the qdrant collection and getting the following kinds of results:
[QueryResponse(id='eedf3b5c-97f0-4be3-8f6e-6cf50b86a32e', embedding=None, sparse_embedding=None, metadata={'document':...

Not sure how embedding could be None, because there's definitely dense and sparse vectors attached (which is how we're querying in the first place. But maybe that's an issue to bring up with the qdrant folks? πŸ˜…
I realized that even doing the plain qdrant "query_points()" gives the same error so I've brought it up to the qdrant folks. "query()" is fine tho, which is weird. Anyway, thank you for taking a peek with me!
That does help!! I just got word from the qdrant team as well and it turns out it's a known issue with the named vectors - I believe this typically happens with hybrid/multimodal search where there are multiple kinds of vectors like 'image' vs 'text'. In our case, we use fast-all-minilm-l6-v2 and fast-sparse-splade_pp_en_v1 as our named vectors. And like you suggeted, it's using the default vector name of ''.

The way they fix this on the qdrant front is by adding the param using to the search/query calls, e.g.:
Plain Text
random_results = _SS_QDRANT_CLIENT.query_points(
    collection_name="STAGE",
    query=[0.0] * 384,
    using="fast-all-minilm-l6-v2",
)
print(f"Random results: {random_results}")


Do you happen to know if there's something that would allow me to add this onto the retrieval step - or wherever makes sense for me to change this default vector name?
The links you added, like where exactly it queries things, helps a lot - I think worse come to worst, I can probably try to overwrite that deep function, but I'm just wondering if there was a different (more elegant?) way
The qdrant guy suggested trying to use enable_hybrid since we have both kinds of vectors already. I got this error instead b'{"status":{"error":"Wrong input: Vector params for text-dense are not specified in config"},"time":0.000079313}'. I feel like we're SO close and that it's just a matter of renaming these in the config - I just haven't found out how πŸ˜…
Okay fixed it! I ended up modifying exactly what you'd mentioned in query() to include the NamedVector:
Plain Text
            response = self._client.search(
                collection_name=self.collection_name,
                query_vector=rest.NamedVector(
                            name="fast-all-minilm-l6-v2",
                            vector=query_embedding,
                        ),
                limit=query.similarity_top_k,
                query_filter=query_filter,
            )


Along with the parsing function parse_to_query_result()
Thanks again, Logan! Have a great rest of your week
Glad it works! And sorry for the troubles on that one πŸ˜… I wonder if there's a way to make that easier for users
Add a reply
Sign up and join the conversation on Discord