Error

At a glance

The community member is using MistralAIEmbedding along with ChromaDB as a vector store and is encountering an error: "chromadb.errors.InvalidDimensionException: Embedding dimension 1536 does not match collection dimensionality 1024 because Mistral has a dimensionality of 1024". The community members discuss that the issue is likely due to mixing embedding models/dimensions, either when ingesting data or when querying. They provide a code snippet showing the configuration setup, and the community members suggest that the issue may be related to the path used for the ChromaDB client. The issue is resolved when the community member fixes a typo in the path, and they note that "the little things can be so frustrating".

ssysfor

I'm using MistralAIEmbedding along with ChromaDB as a vector store.

Getting the following error now: chromadb.errors.InvalidDimensionException: Embedding dimension 1536 does not match collection dimensionality 1024 because Mistral has a dimensionality of 1024

For example using faiss i can specify the dimensionality

d = 1024
faiss_index = faiss.IndexFlatL2(d)

d = property(_swigfaiss.Index_d_get, _swigfaiss.Index_d_set, doc=r""" vector dimension""")

^ i don't see anything like this in Chroma. Is this possible to do using ChromaVectorStore?

6 comments

LLogan M

I think the error here is less about setting the embed dim, and more that somewhere in your code you are mixing embedding models/dimensions

So either when you ingest, you are ingesting into a collection that has a different size

Or when querying, you are not embedding with the same model used to build the index

ssysfor

Do you see anything in here?

Plain Text

class Configuration:
    def __init__(self):
        self.initialize()

    def initialize(self):
        self.llm = Ollama(model="mixtral:8x7b-instruct-v0.1-q6_K", base_url="http://192.168.0.105:1234")
        
        self.embed_model = MistralAIEmbedding(model_name="mistral-embed", 
                                              api_key=MISTRAL_API_KEY)
        
        self.client = chromadb.PersistentClient(path="./dbs/vector_db")
        self.chroma_collection = self.client.get_or_create_collection(name="test")
        
        self.vector_store = ChromaVectorStore(chroma_collection=self.chroma_collection)
        self.storage_context = StorageContext.from_defaults(vector_store=self.vector_store)
        self.service_context = ServiceContext.from_defaults(llm=self.llm,
                                                    chunk_size=1024,
                                                    chunk_overlap=25,
                                                    embed_model=self.embed_model)

...<snip>...

Plain Text

def main():
    config = Configuration()
    document_list = []
    
    rows = extract_and_store_articles_info("./dbs/processed_data_test.db")
    
    for row in rows:
        metadata, text = json.loads(row[0]), row[1]
        documents = load_data(metadata, text)
        document_list.append(documents)

    for document in document_list:
        print(document[0].metadata)    
        try:
            VectorStoreIndex.from_documents(documents=document,
                                            service_context=config.service_context, 
                                            storage_context=config.storage_context,
                                            show_progress=True)
        except ValueError as e:
            print(document, e)
            continue

Plain Text

Parsing nodes: 100%|[00:00<00:00, 150.73it/s]
Generating embeddings: 100%|

ssysfor

I just created this in a new folder also (multiple times). Am i just not seeing something here?

LLogan M

So this looks fine to me so far (assuming that chromadb.PersistentClient(path="./dbs/vector_db") is pointing to a location that previously did not exist)

LLogan M

Does the error happen when calling from_documents() ? Or when querying?

ssysfor

Querying. Looks like I had a typo in the path and it was (as you mentioned) calling a different db. Working now.

The little things can be so frustrating 😉

Add a reply

Find answers from the community

Error