Find answers from the community

Updated last year

Hello mates

Hello mates,

I am trying to insert a new document to existing embeddings but unable to do that. Either I need to recreate the embeddings for all the files, but that increases the cost.

Can some one guide me on this perticular thing.

I have a folder that is /files
Then a folder /storage that keeps the vectorize data.
Now if a add new pdf to /files folder, how can I just embed the new file only to exisiting vector store files.
L
A
9 comments
Plain Text
documents = SimpleDirectoryReader("./files").load_data()

index = load_index_from_storage(...)
for doc in documents:
  index.insert(doc)
Hello @Logan M , thanks for quick help. Let me implement this code and get you back
@Logan M , this is indexing all the files in the folder, How can I put a check that if this file_name already exist then dont do anything and only index the newly uploaded file
This should work, assuming you are using the default vector store πŸ‘

Plain Text
documents = SimpleDirectoryReader("./files", filename_as_id=True).load_data()
...
index.refresh_ref_docs(documents)
Should note that this assumes you built you index already using filename_as_id first
then the next time, using refresh should hopefully work
okay I got it, So i need to index the first document using filename_as_id=true, once I create the vectore store like this, then it will check if the decument is already indexed it will not reindex the document, and it will only index the newly added document
yea that's the idea! πŸ™
Add a reply
Sign up and join the conversation on Discord