Find answers from the community

Updated 2 months ago

Hi, I want to index a corpus of data and

Hi, I want to index a corpus of data and store it directly into chromadb instance.
But this code only genreates a storage folder and than stores it into a file instead of chromadb vector_storedb.
Can anyone help

chromadb_vs = ChromaVectorStore(chroma_collection=chromdb_collection) print("INFO: Initializing the Service Context") service_context = ServiceContext.from_defaults( llm=llm, embed_model="local" ) print("INFO: Creating Vector Store index object") index = VectorStoreIndex.from_documents(documents=documents,vector_store=chromadb_vs,service_context=service_context,show_progress=True) print("INFO: Writing to disk as persistance") index.vector_store.persist()
L
H
r
6 comments
chroma persists automatically, no need to call .persist()
Thanks @Logan M but still don't get the indices stored in the chromadb file since its sqlite3 file is empty.
the code creates the chromadb file but won't store anything inside it.
This is my code:
import logging from openai import OpenAI from llama_index.embeddings import BaseEmbedding from llama_index.callbacks import base_handler from llama_index import SimpleDirectoryReader, VectorStoreIndex, StorageContext, load_index_from_storage , ServiceContext, callbacks from llama_index.vector_stores import ChromaVectorStore from llama_index.llms import HuggingFaceLLM from llama_index.retrievers import VectorIndexRetriever import os, sys import chromadb from chromadb.config import Settings query_str = "" model_name = "mistralai/Mistral-7B-Instruct-v0.2" llm = HuggingFaceLLM( model_name= model_name, device_map="cpu" ) logging.basicConfig(stream=sys.stdout, level=logging.DEBUG) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout)) print("INFO: Reading directory documents............") documents = SimpleDirectoryReader("..\\datasets\\test").load_data(show_progress=True) print("Initializing ChromadDB Collection") chromdb = chromadb.PersistentClient (path="./test",settings = Settings(anonymized_telemetry=False)) chromdb_collection = chromdb.get_or_create_collection ("test") chromadb_vs = ChromaVectorStore(chroma_collection=chromdb_collection) print("INFO: Initializing the Service Context") service_context = ServiceContext.from_defaults( llm=llm, embed_model="local" ) print("INFO: Creating Vector Store index object") index = VectorStoreIndex.from_documents(documents=documents,vector_store=chromadb_vs,service_context=service_context,show_progress=True)
Thanks in advance
@Hoaz double check the link I sent. You missed using the storage context
πŸ‘πŸ» 🫣
The from_documents method in the BaseIndex class does not take a vector_store argument. The arguments it accepts are:

  • cls: The class type.
  • documents: A sequence of documents to build the index from.
  • storage_context: An optional storage context. If not provided, it will use the default storage context.
  • service_context: An optional service context. If not provided, it will use the default service context.
  • show_progress: A boolean indicating whether to show progress or not.
  • **kwargs: Any additional keyword arguments.
So, the correct usage of the method would be:

Plain Text
index = BaseIndex.from_documents(documents=documents, service_context=service_context, show_progress=True)


If you need to use a specific vector store, you should set it in the storage_context before calling from_documents.
Add a reply
Sign up and join the conversation on Discord