Find answers from the community

Updated 2 years ago

I m a bit confused about how the

At a glance

The community member is using QDrant and is confused about how the interactions between indices, documents, and vector stores work. They ask how to specify that they want to use QDrant as the Document Store in the StorageContext, and whether QDrant can store the vectors and documents (using the payload feature), while the created indices would need to be stored separately, such as in S3.

The comments explain that QDrant (and most vector stores) store the entire index in the vector_store, which is a bit hacky but very convenient. To reconnect to a previously created vector store on QDrant, the community member can use the QdrantVectorStore and StorageContext.from_defaults to set up the index. They don't need to set an IndexStore in the StorageContext, as it won't be used.

The comments further clarify that with a vector store integration, the community member doesn't need to worry about calling persist(), as the text is also stored in the vector_store as part of the payload metadata. When retrieving the text using the reader, it will correspond to the chunk of the original unchunked text that the vector was made from.

The community member also asks about the usecase for the reader

Useful resources
I'm a bit confused about how the interactions between indices, documents, and vector stores work. I'm using QDrant atm. So I see from QdrantReader that it basically just assumes you're adding a payload keyed with 'text' to retrieve the documents from. For the StorageContext, how would I specify that I want to use Qdrant as the Document Store?

And to clarify, Qdrant would be able to store the vectors, the documents (kinda, by hacking the payload feature), but to store the created indices, would have to use something like S3? So the update/modify path would be basically insert new document into Qdrant, and then load the associated index from s3, do the insertion separately there, and re-save onto s3?
L
L
15 comments
As far as I know, QDrant (and most vector stores) store the entire index on the vector_store (a little hacky, but it's extremely convienant)

To re-connect to a vector store you created on QDrant, you should be able to do something like this (assuming you connect the client to a previously created vector store on qdrant)

Plain Text
vector_store = QdrantVectorStore(client=client, collection_name="paul_graham")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex([], storage_context=storage_context)
So I shouldn't need to set a IndexStore in the StorageContext?
it wont' be used even if you set it πŸ˜…
(as far as I know)
Basically with a vector store integration, you don't really need to worry about calling persist()
Ok, and can I assume then the docs will also be stored with the corresponding text already chunked in the vector store?
So when I retrieve the text using the reader, it'll correspond to the chunk of the original unchunked text that the vector was made from?
Yea, the text is also stored in the vector_store as well (for qdrant, it's part of the payload metadata dict)
Curious what the usecase is for the reader? I wasn't sure if anyone even used those lol
Hopefully won't need it now, but was thinking of cases when I might need to rebuild the index from the docs without having the users upload everything again
ah, I see! Hopefully this approach works for you then
Thanks! I'll probably be back with more questions lol, many changes from 0.5.x
Big upgrade lol! Sounds good :BOUNCE:
Add a reply
Sign up and join the conversation on Discord