Find answers from the community

Updated 4 months ago

Heyhey another MongoDbAtlasVectorSearch

At a glance
Heyhey, another MongoDbAtlasVectorSearch question. I am currently using a weaviatevectorstore for local development and we want to use mongodb for production.

In order to work with multiple indices and Weaviate, I need to create a new WeaviateVectorStore and re-define the index_name whenever I am working with a different index. This is required to prevent weaviate from storing all Nodes under the same index name, which would cause indices to use all documents assigned to other indices, as they were all stored under the same index_name. This works as expected with Weaviate with little to no issues besides a bit of quirky code.

When working with MongoDBAtlas I wanted to do the same, to prevent my indices to use documents assigned to other indices. So, I am once again creating a mongodbatlasvectorsearch object, with a unique index_name. However, in the debug logs nor in the MongoDBAtlas collection viewer online can I see any trace of the unique index_name that I assigned to this vectorstore. Instead, it practically inserts a JSON representation of the Node, with seemingly no reference to the specified index_name. During query time however, I do see a reference of my specified index_name in the debug logs, where it is apparently using said index to create a query pipeline.
Query debug:
Plain Text
DEBUG:llama_index.vector_stores.mongodb:Running query pipeline: [{'$search': {'index': 'QApp_2820b774_5218_4e20_b389_0ebdb2fc4765', 'knnBeta': {'vector': [<vector>], 'path': 'embedding', 'k': 2}}}, {'$project': {'score': {'$meta': 'searchScore'}, 'embedding': 0}}]

Document insert debug:
Plain Text
DEBUG:llama_index.vector_stores.mongodb:Inserting data into MongoDB: [{'id': '8e7c7e88-25d5-4f2e-ba01-de373c0c0516', 'embedding': [<vector>], 'text': <document text>, 'metadata': {<metadata>} etc. etc. 
O
L
14 comments
The only reference to the default value of the index_name in the source code that would make sense, is if it is mapped to the search index that is present in MongoDBAtlas
Please don't tell me that for every Index I want to query I have to manually create a new MongoDB search index, which I then somehow have to dynamically map to only use the Documents that have been assigned to that Index. The Documents that are stored in MongoDB (default_db/default_collection) have practically no reference to the index from which they have been inserted. I genuinely can't even think of a way to go from index_id in the mongodb indexstore/data to a list of document_ids that are assigned to said index. I don't even think this is a functionality as I read somewhere in the documentation (at some point in time) that documents can be assigned to multiple indices as well.
I understand your comment about not using mongodb atlas if this is the case @Logan M πŸ˜‚ 😭
Hahaha yeaaaa you are hitting all the fun parts with it πŸ˜‚ good details to take to your team if they are pushing for mongodb lol

If you do figure out how to do any of this from their api, feel free to make a PR to make this vector store better πŸ₯°
Oh my lord I'll try my bestπŸ˜‚πŸ˜‚ I might try it just for the sake of mongodb, I already convinced my team to stick to weaviate for our MVP
I will do some experimenting for the sake of llamaindex, but I genuinely have no clue how to get started. It doesn't seem like the indexstore has any references to the documents concretely. I don't know how to go from an index ID to a list of documents that are related to said index
If you have any pointers I will love you forever (platonically πŸ˜‚πŸ˜‰)
Hmm yea when using a vector store db, the index store and docstore don't actually get used -- everything goes into the vector index

You can override this by setting store_nodes_override=True in vector store constructor, but then you have to manage loading/saving the docstore and index store

Hope that makes some sense πŸ˜…
That sounds like an exceptional amount of painπŸ˜‚ I will try to figure something out this weekend, if I'm not out drinking I'll try my best to get something working and potential make a PR to the repo
have at least one on me! 🍻 πŸ˜†
This one's on youπŸ˜‚β™₯️
Add a reply
Sign up and join the conversation on Discord