Find answers from the community

Updated 3 months ago

i am using an ingrstion pipeline to

i am using an ingrstion pipeline to ceate a bunch of embeddings for a pdf article and store them in chromadb. What is the best practice for updating these embeddings. Currently when i process the same file twice, it inserts another lot of embeddings.
R
1 comment
If you set docstore_strategy to UPSERTS_AND_DELETE, then everytime you run the pipeline, only updated and newly added nodes will be added to your chroma vector store, and old nodes that are not present in the updated documents will be deleted from both the docstore and vectorstore.

Plain Text
from llama_index.core.ingestion import DocstoreStrategy, IngestionPipeline

pipeline = IngestionPipeline(
  transformations=transformations,
  docstore=docstore,
  vector_store=your_chroma_vector_store,
  docstore_strategy=DocstoreStrategy.UPSERTS_AND_DELETE)
Add a reply
Sign up and join the conversation on Discord