Error add to a collection in chromadb:

At a glance

The post describes an error encountered when trying to add data to a ChromaDB collection. Community members provide suggestions on the proper way to use ChromaDB, including examples of how to create and load a vector store. The community members discuss how to load raw and recursive indices separately, and how to handle the AssertionError that arises when using the FlagEmbeddingReranker with a particular model. There is no explicitly marked answer, but the community members provide guidance on the correct usage of ChromaDB and troubleshooting the issues encountered.

Useful resources

kkev

Error add to a collection in chromadb:

collection_name = "name"
vector_store = ChromaVectorStore(chroma_collection=collection_name)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

raw_index = VectorStoreIndex.from_documents(
parsed_docs,
storage_context=storage_context,
embed_model=Settings.embed_model
)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-41-79eec0778777> in <cell line: 7>()
      5 storage_context = StorageContext.from_defaults(vector_store=vector_store)
      6 
----> 7 raw_index = VectorStoreIndex.from_documents(
      8                                             parsed_docs,
      9                                             storage_context=storage_context,

6 frames
/usr/local/lib/python3.10/dist-packages/llama_index/vector_stores/chroma/base.py in add(self, nodes, **add_kwargs)
    263                 documents.append(node.get_content(metadata_mode=MetadataMode.NONE))
    264 
--> 265             self._collection.add(
    266                 embeddings=embeddings,
    267                 ids=ids,

AttributeError: 'str' object has no attribute 'add'

19 comments

LLogan M

This isn't how you use chroma

LLogan M

Plain Text

db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

LLogan M

is one example

LLogan M

many on this page

LLogan M

https://docs.llamaindex.ai/en/stable/examples/vector_stores/ChromaIndexDemo/?h=chroma

kkev

Thanks logan.

kkev

I am following the llama-parse example in https://github.com/run-llama/llama_parse/blob/main/examples/demo_advanced.ipynb and building a raw_index and recursive_index. I was able to build the indices in chromadb, however, how do I load it from the disk? Here's an example I am referring to:

Plain Text

# save to disk

db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=embed_model
)

# load from disk
db2 = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db2.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(
    vector_store,
    embed_model=embed_model,
)

# Query Data from the persisted index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
display(Markdown(f"{response}"))

kkev

My code:

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

raw_index = VectorStoreIndex.from_documents(
                                               parsed_docs, 
                                               storage_context=storage_context, 
                                               embed_model=Settings.embed_model
                                           )

recursive_index = VectorStoreIndex(
                                    nodes=base_nodes + objects,
                                    storage_context=storage_context,
                                    embed_model=Settings.embed_model
                                  )

I am trying to load the raw and recursive separately, and not sure where to specify in VectorStoreIndex.from_vector_store

LLogan M

The example shows the loading

You'd jsut create two vector store objects, one for each index, and do VectorStoreIndex.from_vector_store(vector_store)

LLogan M

Plain Text

# load from disk
db2 = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db2.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(
    vector_store,
    embed_model=embed_model,
)

kkev

in the same collection or a different collection?

LLogan M

it would be a collection per index

kkev

ah ok, makes sense

kkev

So, I am getting an AssertionError when attempting to run: response_1 = raw_query_engine.query(query)

kkev

Code:

Plain Text

from llama_index.postprocessor.flag_embedding_reranker import (
    FlagEmbeddingReranker,
)

llm = MistralAI(
                model="mistral-small-latest",
                api_key=userdata.get('MISTRAL_API_KEY')
               )


reranker = FlagEmbeddingReranker(
    top_n=5,
    model="sentence-transformers/all-MiniLM-L6-v2",
)

raw_query_engine = raw_index.as_query_engine(
                                              similarity_top_k=15,
                                              node_postprocessors=[reranker],
                                              llm=llm
                                            )

recursive_query_engine = recursive_index.as_query_engine(
                                                          similarity_top_k=15,
                                                          node_postprocessors=[reranker],
                                                          verbose=True,
                                                          llm=llm
                                                        )

kkev

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/all-MiniLM-L6-v2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

kkev

Error:

Plain Text

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-48-4badad8cf032> in <cell line: 3>()
      1 query = "What is the  Section 8 Rent Income in March 2023 at The Tillicum Apartments?"
      2 
----> 3 response_1 = raw_query_engine.query(query)
      4 print("\n***********New LlamaParse+ Basic Query Engine***********")
      5 print(response_1)

7 frames
/usr/local/lib/python3.10/dist-packages/llama_index/postprocessor/flag_embedding_reranker/base.py in _postprocess_nodes(self, nodes, query_bundle)
     71                 scores = [scores]
     72 
---> 73             assert len(scores) == len(nodes)
     74 
     75             for node, score in zip(nodes, scores):

AssertionError:

LLogan M

Hmm, I don't think flag embedding rerranker is meant to be used with that model?

kkev

ah that's right.

Add a reply

Find answers from the community

Error add to a collection in chromadb: