get
method on the chroma_collection
object. doc_to_update = chroma_collection.get(limit=1) doc_to_update["metadatas"][0] = { **doc_to_update["metadatas"][0], **{"author": "Paul Graham"}, } chroma_collection.update( ids=[doc_to_update["ids"][0]], metadatas=[doc_to_update["metadatas"][0]] ) updated_doc = chroma_collection.get(limit=1) print(updated_doc["metadatas"][0])
get
method is used to retrieve the first document from the Chroma DB. The metadata of the document is then updated to change the author to "Paul Graham". The update
method is then used to update the document in the Chroma DB. The updated document is then retrieved and its metadata is printed to confirm the update."author": "Paul Graham"
with the actual changes you want to make to the document's nodes.docstore_strategy
parameter of the IngestionPipeline.docstore_strategy
to DocstoreStrategy.UPSERTS
, the pipeline will handle upserting data into your vector store. This means that if a duplicate doc_id
is detected, and the hash of the document has changed, the document will be re-processed and upserted. If the hash is unchanged, the node is skipped.from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core.ingestion import ( DocstoreStrategy, IngestionPipeline, IngestionCache, ) from llama_index.core.ingestion.cache import RedisCache from llama_index.storage.docstore.redis import RedisDocumentStore from llama_index.core.node_parser import SentenceSplitter from llama_index.vector_stores.redis import RedisVectorStore embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5") pipeline = IngestionPipeline( transformations=[ SentenceSplitter(), embed_model, ], docstore=RedisDocumentStore.from_host_and_port( "localhost", 6379, namespace="document_store" ), vector_store=RedisVectorStore( index_name="redis_vector_store", index_prefix="vectore_store", redis_url="redis://localhost:6379", ), cache=IngestionCache( cache=RedisCache.from_host_and_port("localhost", 6379), collection="redis_cache", ), docstore_strategy=DocstoreStrategy.UPSERTS, )
IngestionPipeline
is set up with a RedisDocumentStore
as the docstore and a RedisVectorStore
as the vector store. The docstore_strategy
is set to DocstoreStrategy.UPSERTS
, which means that the pipeline will handle upserting data into the vector store.update_ref_doc
method provided by LlamaIndex. This method updates a document and its corresponding nodes. It's equivalent to deleting the document and then inserting it again.doc_chunks[0].text = "Brand new document text" index.update_ref_doc( doc_chunks[0], update_kwargs={"delete_kwargs": {"delete_from_docstore": True}}, )
update_ref_doc
method is called to update the document in the index. The delete_kwargs
parameter is used to ensure that the document is deleted from the docstore before it's updated.update_ref_doc
method is part of the BaseIndex
class in LlamaIndex. Depending on the specific index you're using, the implementation of this method may vary. For example, the update_ref_doc
method is not implemented for the LlamaCloudIndex
and will raise a NotImplementedError
if called.IngestionPipeline
in LlamaIndex is designed to process and ingest new or updated documents into a document store and vector store. IngestionPipeline
, and it might not be the most efficient way to achieve your goal. If you frequently need to update the nodes of documents without changing the documents themselves, you might want to consider a different approach or tool that is better suited to your needs.IngestionPipeline
, you can refer to the LlamaIndex documentation ().delete_ref_doc
method provided by the index. This method deletes a document and its corresponding nodes from the index.index.delete_ref_doc("doc_id_0", delete_from_docstore=True)
delete_ref_doc
method is called with the ID of the document you want to delete. The delete_from_docstore
parameter is set to True
, which means that the document will also be deleted from the docstore.delete_ref_doc
method is part of the BaseIndex
class in LlamaIndex. Depending on the specific index you're using, the implementation of this method may vary.