The community member is experiencing an issue when working with ChromaDB and loading an index from a VectorStoreIndex. They are getting an InvalidDimensionException error, which they suspect is because the embedding model used by ChromaDB does not match the dimensions expected by the OpenAI language model they are using.
Other community members suggest that the issue is likely due to a mismatch between the embedding model used in the LlamaIndex library and the one used to create the index. They recommend checking the embedding model used by ChromaDB, which is reportedly the "sentence-transformers/all-MiniLM-L6-v2" model. One community member confirms that setting the HuggingFaceEmbeddings to use this model resolves the issue.
When working with chromadb directly, and loading the index frm VectorStoreIndex.from_vector_store(), I get the following error when using the chat_repl()
chromadb.errors.InvalidDimensionException: Embedding dimension 768 does not match collection dimensionality 384
I am using OpenAI as the LLM, im assuming this is because when i do chroma_collection.upsert() (via there API) that this uses their default embedding model which doesn't match the the dimensions that OpenAI expects?
yea so theres two models in llama-index, the LLM and the embedding model
It looks like whichever embedding model you are using in llama-index is not the same as the embedding model that created the index. These need to be the same
Worked! If anyone else is using chromadb in a different pipeline, be sure to set HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")