yes I did look at this reader The issue

At a glance

yes, I did look at this reader. The issue is that the file saved in mongodb is not saved as is but rather split into chunks (by GridFS), so m not sure if it would work.

2 comments

WWhiteFang_Jr

I'm not aware with GridFS, But if it is chunks already you can use fetch the data using GRIDFS query and use Document class to create Node Objects. Then you can simply pass them into VectorStoreIndex like this

Plain Text

documents = []
# fetch all the chunks and iterate over them to make node objects
# add other useful info like filename, any other imp info in metadata
# chunk should be of type text.
for chunks in mongo_chunks:     
    documents.append(Document(text=chunk, metadata={"key":"value"}))

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

TTungdepzai

this is what a saved file looks like in mongodb, they are saved as chunks of binary not texts. The only way I know to retrieve the file is to download it, so I was thinking of downloading it to the server and feed it through SimpleDirectoryReader

Attachments

Add a reply

Find answers from the community

yes I did look at this reader The issue