Find answers from the community

Updated 2 months ago

Error

I'm using MistralAIEmbedding along with ChromaDB as a vector store.

Getting the following error now: chromadb.errors.InvalidDimensionException: Embedding dimension 1536 does not match collection dimensionality 1024 because Mistral has a dimensionality of 1024

For example using faiss i can specify the dimensionality

d = 1024
faiss_index = faiss.IndexFlatL2(d)

d = property(_swigfaiss.Index_d_get, _swigfaiss.Index_d_set, doc=r""" vector dimension""")

^ i don't see anything like this in Chroma. Is this possible to do using ChromaVectorStore?
L
s
6 comments
I think the error here is less about setting the embed dim, and more that somewhere in your code you are mixing embedding models/dimensions

So either when you ingest, you are ingesting into a collection that has a different size

Or when querying, you are not embedding with the same model used to build the index
Do you see anything in here?

Plain Text
class Configuration:
    def __init__(self):
        self.initialize()

    def initialize(self):
        self.llm = Ollama(model="mixtral:8x7b-instruct-v0.1-q6_K", base_url="http://192.168.0.105:1234")
        
        self.embed_model = MistralAIEmbedding(model_name="mistral-embed", 
                                              api_key=MISTRAL_API_KEY)
        
        self.client = chromadb.PersistentClient(path="./dbs/vector_db")
        self.chroma_collection = self.client.get_or_create_collection(name="test")
        
        self.vector_store = ChromaVectorStore(chroma_collection=self.chroma_collection)
        self.storage_context = StorageContext.from_defaults(vector_store=self.vector_store)
        self.service_context = ServiceContext.from_defaults(llm=self.llm,
                                                    chunk_size=1024,
                                                    chunk_overlap=25,
                                                    embed_model=self.embed_model)


...<snip>...

Plain Text
def main():
    config = Configuration()
    document_list = []
    
    rows = extract_and_store_articles_info("./dbs/processed_data_test.db")
    
    for row in rows:
        metadata, text = json.loads(row[0]), row[1]
        documents = load_data(metadata, text)
        document_list.append(documents)

    for document in document_list:
        print(document[0].metadata)    
        try:
            VectorStoreIndex.from_documents(documents=document,
                                            service_context=config.service_context, 
                                            storage_context=config.storage_context,
                                            show_progress=True)
        except ValueError as e:
            print(document, e)
            continue


Plain Text
Parsing nodes: 100%|[00:00<00:00, 150.73it/s]
Generating embeddings: 100%|
I just created this in a new folder also (multiple times). Am i just not seeing something here?
So this looks fine to me so far (assuming that chromadb.PersistentClient(path="./dbs/vector_db") is pointing to a location that previously did not exist)
Does the error happen when calling from_documents() ? Or when querying?
Querying. Looks like I had a typo in the path and it was (as you mentioned) calling a different db. Working now.

The little things can be so frustrating πŸ˜‰
Add a reply
Sign up and join the conversation on Discord