Find answers from the community

Updated 6 months ago

Hey! I have a newbie question.

At a glance

Hey! I have a newbie question.
I'M relatively new to VectorStores. What I want is accumulate data over time. (just like ReBuff, when checking for Prompt Injection), however, I do not want to overwrite, if something already exists.
I have a rag pipeline, that gathers information and inserts it into an external vector store, but it is dynamic, so I'm not sure if something exists or not, and would not want to waste resources!
Thanks! Glad to be here!

5 comments

LLogan M

Have you seen the ingestion pipeline? If you attach a docstore and vectorstore to it, it will handle upserts for you

Plain Text

pipeline = IngestionPipeline(..., docstore=docstore, vector_store=vecstor_store)

pipeline.run(documents=documents)

index = VectorStoreIndex.from_vector_store(vector_store)

Just have to make sure the docstore is saved somewhere (to disk, or using integrations like mongodb, redis, etc.)

TTheDorsan

Thank you! I've just come across this: https://docs.llamaindex.ai/en/stable/examples/ingestion/ingestion_gdrive/
Which is basically the same! 😄
Does this use the metadata on the Document itself as well?
For example, Ingesting Wikipedia pages, setting the metadata to be the url.

TTheDorsan

Or just the doc hash

LLogan M

Its using a hash of the content + metadata, and mapping that to the ID of a document object

LLogan M

So as long as the ID is consistent (a URL, a filepath, etc) it can compare and upsert properly

Add a reply