Find answers from the community

Updated 6 months ago

I'm using mongodb as vector store and

I'm using mongodb as vector store and llamaindex backend can find information when I have only a few embedded documents. But when number of embeddings increases, llamaindex backend can no longer find anything, not even the same info that it was able to retrieve before growing the vector store. Any advice would be much appreciated.
1
W
S
b
19 comments
This is more of mongo vector store problem statement πŸ˜…
I'm relatively new to this. What can I do for next steps?
First it would be to debug the mongo vector store code on how it is working
I use the standard llama-create python backend from https://github.com/run-llama/create-llama
using text-embedding-3-large with 3072 dimensionality and gpt-4-turbo-preview
I pipe concatenate the fields and embed them. MongoDB creates the embedding and metadata fields. I also have a Mongo Atlas search index defined with 3072 dimensionality and cosine matching.
Is there any way to see why or why not llamaindex uses a particular embedding from the vector store?
I thought it would be able to match fields based on the metadata object stored in each embedding.
This is what I have in the index of the engine:
import logging
import os

from llama_index.core.indices import VectorStoreIndex
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch


logger = logging.getLogger("uvicorn")


def get_index():
logger.info("Connecting to index from MongoDB...")
store = MongoDBAtlasVectorSearch(
db_name=os.environ["MONGODB_DATABASE"],
collection_name=os.environ["MONGODB_VECTORS"],
index_name=os.environ["MONGODB_VECTOR_INDEX"],
)
index = VectorStoreIndex.from_vector_store(store)
logger.info("Finished connecting to index from MongoDB.")
return index
how can I change index = VectorStoreIndex.from_vector_store(store) to include metadata in embeddings? I suspect the metadata is stored in vector store during embedding but the BasicChatEngine, which uses the above engine index doesn't necessarily know how to use the metadata field
You need to add the metadata at the time of creation of the nodes i.e while ingestion.Once it's done, while querying use vector_store_kwargs or metadatfilter params to query on those metadata only.
the first part during ingestion is done. How would I do the second part? Any references would be appreciated. The MongoDBAtlasVectorSearch initialization function has metadata_key: str = "metadata" as an input param, in which case for me is also the default name "metadata". I'm not quering directly. I have a BasicChatEngine which is initialized from the index. Would I need to change the BasicChatEngine to a query engine?
@kapa.ai @Logan M when i use the S3Reader to get documents, the text that i get is encrypted. How to get decrypted documents?
Let me check the knowledge base for you. One moment! :books:
----------------------
I'm sorry, but the provided knowledge sources do not contain information on how to decrypt documents when using the S3Reader in LlamaIndex. The S3Reader is designed to read files or directories from an S3 bucket and load them as documents. It does not inherently provide a decryption mechanism.

If your documents are encrypted, you would need to implement a decryption process either before using the S3Reader or after retrieving the documents. This would typically involve using the appropriate decryption key and algorithm that matches the encryption method used on your documents.

For more details on how to use the S3Reader, you can refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai can i use langchain's S3 document reader to get documents and then use llamaindex to create nodes ?
Let me check the knowledge base for you. One moment! :books:
----------------------
Yes, you can use LangChain's S3 document reader to get documents and then use LlamaIndex to create nodes. Here's a general outline of the process:

  1. Use the S3Reader to load documents from an S3 bucket. The load_s3_files_as_docs method can be used for this purpose. Here's the method signature from the LlamaIndex documentation ():
Plain Text
load_s3_files_as_docs(temp_dir=None) -> List[Document]


  1. Once you have the documents, you can use LlamaIndex to create nodes. The VectorStoreIndex.from_documents method can be used to create an index from the documents. Here's an example from the LlamaIndex documentation ():
Plain Text
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents, fs=s3)


Please note that you'll need to replace SimpleDirectoryReader with the S3Reader and adjust the parameters accordingly. Also, fs=s3 indicates that the filesystem being used is S3.

Remember to handle the documents appropriately based on their format and your specific requirements.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord