Find answers from the community

Updated 3 months ago

Hey! I have a newbie question.

Hey! I have a newbie question.
I'M relatively new to VectorStores. What I want is accumulate data over time. (just like ReBuff, when checking for Prompt Injection), however, I do not want to overwrite, if something already exists.
I have a rag pipeline, that gathers information and inserts it into an external vector store, but it is dynamic, so I'm not sure if something exists or not, and would not want to waste resources!
Thanks! Glad to be here!
L
T
5 comments
Have you seen the ingestion pipeline? If you attach a docstore and vectorstore to it, it will handle upserts for you

Plain Text
pipeline = IngestionPipeline(..., docstore=docstore, vector_store=vecstor_store)

pipeline.run(documents=documents)

index = VectorStoreIndex.from_vector_store(vector_store)


Just have to make sure the docstore is saved somewhere (to disk, or using integrations like mongodb, redis, etc.)
Thank you! I've just come across this: https://docs.llamaindex.ai/en/stable/examples/ingestion/ingestion_gdrive/
Which is basically the same! πŸ˜„
Does this use the metadata on the Document itself as well?
For example, Ingesting Wikipedia pages, setting the metadata to be the url.
Or just the doc hash
Its using a hash of the content + metadata, and mapping that to the ID of a document object
So as long as the ID is consistent (a URL, a filepath, etc) it can compare and upsert properly
Add a reply
Sign up and join the conversation on Discord