persist()
method. Here's an example:# Save the index storage_context.persist()
# note down index IDs list_id = summary_index.index_id vector_id = vector_index.index_id keyword_id = keyword_table_index.index_id
load_index_from_storage()
function from the llama_index.core
module. You'll need to recreate the storage context and provide the index ID.``
(Source: **LlamaIndex > En > Latest > Understanding > Storing > Storing > Using Vector Stores Part 2** (<https://docs.llamaindex.ai/en/latest/understanding/storing/storing#using-vector-stores>))
Please replace
"quickstart" with the name of your Chroma collection and
list_id` with the ID of the index you want to load.DocumentSummaryIndex
instance, you can use the persist()
method of the storage_context
associated with your index. Here's how you can do it:# Persist the document store and index store doc_summary_index.storage_context.persist("index")
DocumentSummaryIndex
to disk. The argument to the persist()
method is the directory where the index will be saved. In this case, it's saved in a directory named "index".load_index_from_storage()
function from the llama_index.core
module:from llama_index.core import load_index_from_storage from llama_index.core import StorageContext # rebuild storage context storage_context = StorageContext.from_defaults(persist_dir="index") doc_summary_index = load_index_from_storage(storage_context)
storage_context
and load the DocumentSummaryIndex
from the specified directory.Node
objects. It's designed to store and manage these nodes in a way that makes it easy to use them in different indices without duplicating data. SimpleDocumentStore
stores Node
objects in-memory. They can be persisted to (and loaded from) disk by calling docstore.persist()
(and SimpleDocumentStore.from_persist_path(...)
respectively). from llama_index.core import SimpleDocumentStore # Define the path where your document store is saved persist_path = "/path/to/your/docstore" # Check if the document store exists if SimpleDocumentStore.exists(persist_path): # Load the document store docstore = SimpleDocumentStore.from_persist_path(persist_path) else: print("Document store does not exist.")
"/path/to/your/docstore"
with the actual path where your document store is saved. The exists()
method checks if a document store exists at the specified path, and from_persist_path()
loads it.import chromadb from llama_index.vector_stores.chroma import ChromaVectorStore from llama_index.core import StorageContext from llama_index.index_structs.vector_store import VectorStoreIndex # Create the Chroma client remote_db = chromadb.HttpClient() # Get or create the collection chroma_collection = remote_db.get_or_create_collection("quickstart") # Create the vector store vector_store = ChromaVectorStore(chroma_collection=chroma_collection) # Create the storage context storage_context = StorageContext.from_defaults(vector_store=vector_store) # Create the index index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, embed_model=embed_model )
persist()
method. Here's an example:# Persist the index storage_context.persist()
"quickstart"
with the name of your Chroma collection and documents
and embed_model
with your actual documents and embedding model.load_index_from_storage()
function from the llama_index.core
module. Here's an example:from llama_index.core import load_index_from_storage from llama_index.core import StorageContext # Define the directory where your index is saved persist_dir = "/path/to/your/index" # Rebuild the storage context storage_context = StorageContext.from_defaults(persist_dir=persist_dir) # Load the index from the storage index = load_index_from_storage(storage_context)
"/path/to/your/index"
with the actual path where your index is saved. The from_defaults()
method recreates the storage_context
and load_index_from_storage()
loads the index from the specified directory.ChromaVectorStore
instance, and use them to create the storage_context
(source ()).from llama_index.core import SimpleDocumentStore # Define the path where your document store is saved persist_path = "/path/to/your/docstore" # Load the document store docstore = SimpleDocumentStore.from_persist_path(persist_path) # Get the documents from the document store documents = docstore.get_all_documents() # Now you can create the index index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, embed_model=embed_model )
"/path/to/your/docstore"
with the actual path where your document store is saved. The from_persist_path()
method loads the document store from the specified path, and get_all_documents()
retrieves all documents from the document store.from_persist_path()
method of the SimpleDocumentStore
class to load the document store from disk.from llama_index.core import SimpleDocumentStore # Define the path where your document store is saved persist_path = "/path/to/your/docstore" # Load the document store docstore = SimpleDocumentStore.from_persist_path(persist_path)
from_documents()
method of the VectorStoreIndex
class to rebuild the index. This method takes the documents from the document store, the storage context, and the embedding model as arguments.from llama_index.core import VectorStoreIndex # Get the documents from the document store documents = docstore.get_all_documents() # Now you can create the index index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, embed_model=embed_model )
"/path/to/your/docstore"
with the actual path where your document store is saved. The from_persist_path()
method loads the document store from the specified path, and get_all_documents()
retrieves all documents from the document store.SimpleDocumentStore
in the provided sources. The get_all_documents()
method is a hypothetical function and might not exist in the actual LlamaIndex API. You might need to iterate over the document store to retrieve all documents, or there might be a different method for this purpose in the actual API. Please refer to the LlamaIndex documentation () or the source code for the exact details.from_documents()
method, it takes the documents, generates their embedding vectors using the provided embedding model, and stores these vectors. The original documents can be stored in a Document Store and persisted separately.from llama_index.core import SimpleDocumentStore # Create the document store docstore = SimpleDocumentStore() # Add documents to the document store docstore.add_documents(documents) # Persist the document store docstore.persist("/path/to/your/docstore")
from llama_index.core import VectorStoreIndex # Create the index index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, embed_model=embed_model )
documents
with your actual documents, "/path/to/your/docstore"
with the actual path where you want to save your document store, storage_context
with your actual storage context, and embed_model
with your actual embedding model.from llama_index.core import StorageContext from llama_index.vector_stores.chroma import ChromaVectorStore import chromadb # Initialize the Chroma client remote_db = chromadb.HttpClient() # Get the collection chroma_collection = remote_db.get_or_create_collection("quickstart") # Create the vector store vector_store = ChromaVectorStore(chroma_collection=chroma_collection) # Create the storage context storage_context = StorageContext.from_defaults(vector_store=vector_store) # Load the index index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
add_documents()
method. Here's an example:# Add new documents to the index index.add_documents(new_documents)
"quickstart"
with the name of your Chroma collection, and new_documents
with your actual new documents.count()
method. Here's an example:# Get the total number of documents in the index total_documents = index.count() print(f"Total documents: {total_documents}")
count()
method on the ChromaVectorStore
instance (source ()).count()
method may not be available for the index object depending on the specific type of index you're using. count_documents()
method on the collection object to get the total number of documents:# Get the total number of documents in the index total_documents = store._collection.count_documents({}) print(f"Total documents: {total_documents}")
count()
method of the ChromaVectorStore
instance. Here's an example:# Get the total number of documents in the index total_documents = vector_store.count() print(f"Total documents: {total_documents}")
vector_store
is the ChromaVectorStore
instance associated with your index.exists()
method of the ChromaVectorStore
instance to check if the index exists. Here's an example:# Check if the index exists if vector_store.exists(): print("Index exists.") else: print("Index does not exist.")
vector_store
is the ChromaVectorStore
instance associated with your index.