I am trying to use llamaindex for a

At a glance

The community member is trying to use llamaindex for a local retrieval-augmented generation (RAG) system, using Ollama-based Nomic embeddings and Ollama-based LLM (llama3). They are using ChromaDB as the local persistent vector store, but are facing an issue where the similarity scores are way off, likely due to the different embeddings used by ChromaDB and llamaindex.

The community members have tried various approaches, including using OpenAI embeddings and LLM, as well as Hugging Face embeddings, but the issue persists when using the Nomic embeddings via Ollama. They suspect that there is a mismatch between the Nomic embeddings specified in llamaindex and the default all-MiniLM-L6-v2 embeddings used by ChromaDB.

The community members have provided code examples and discussed the issue with the llamaindex developer, but a solution has not been found yet. They are still looking for a way to use the local Nomic embeddings and llama3 LLM while also storing the vector database locally with ChromaDB.

bbavquant

I am trying to use llamaindex for a local rag system using ollama based nomic embeddings, ollama based llama3 as llm and using a locally persistent vector store like chroma. problem appears to be that chroma useses its own embedding, and I am not sure how to pass the ollama based nomic embeddings to the chroma vector store. Result is, that when I do my rag, the similarity index is way off, my hypothesis due to the different embeddings used... any help or advise for a different vector store that I can run locally and which also allows filtering by metadata highly appreciated..

12 comments

LLogan M

llama-index will embed your data before inserting, skipping any embedding model used by chroma

LLogan M

this all happens automatically assuming you used the vector store integration

bbavquant

yes, i realize that... Unfortunately llamaindex per default uses the openai embeddings, which means regular api call. I would like to establish a RAG which uses the local machine for embedding and llm calls, while at the same time also storing the vector database locally. Chroma works great for storing vectors locally, and also works in combination with llamaindex - however, all the embedding calls and llm calls go to OpenAI. There is a hugging face interface, which allows to call hugging face but that turns out to create the same issue (remote calls). Since llama3 is fairly fast running it with Ollama locally, I am looking for a way to create embeddings locally and also use llama3 for llm calls..... Any suggestion / help highly appreciated!!

LLogan M

we have multiple embedding integrations. Local huggingface embeddings, embeddings through ollama, etc.

bbavquant

Yes, but there appears to be a mismatch when I use chromadb to make the vector store persistent due to the default embedding used by chroma db (all-MiniLM-L6-v2) vs my settings in llama index (Settings.llm = Ollama(model=‘llama3), Settings.embed_model=OllamaEmbedding(model_name=‘nomic-embed-text, base_url =….)… in any case, the similarities returned are way off. Any help for how to get this work greatly appreciated

LLogan M

Llama index embeds anything before inserting, and also embed the query for you, using the model from settings

Assuming you both created and queried the index with the same embedding model, it should be fine (and works fine for me 🤷‍♂️ )

bbavquant

Thanks for engaging with me, Logan! yes, llamaindex by itself works.... however, if i try to use an Ollama based embedding (Nomic) for RAG, somehow things don't work anymore. My understanding is that Chroma DB has a different default embedding. I enclose three examples: 1) openai embeddings and openai llm: Works!, 2) openai embeddings and local llama3 llm w/ Ollama: Works, 3) local Nomic embedding w/ Ollama and local llama3 llm w/ Ollama: DOES NOT WORK. Similarity scores for 1) and 2) are the expected .4, .6 etc., similarity scores for 3) are 1- e-286 .... For me a clear indication that something is wrong with the embeddings. By the way, things also work when using Hugging Face embeddings, but I am interested in using the Nomic embedding via Ollama. Any help greatly appreciated, haven't been able to find a solution for a week.....

LLogan M

I'm not sure man, seems like a bug with chroma itself, especially since it works fine with other embedding models

LLogan M

while the scores are wonky, it seems to be retrieving the correct nodes in my tests

LLogan M

Plain Text

import os
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core import VectorStoreIndex, Settings, SimpleDirectoryReader, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
Settings.llm = Ollama(model="llama3", request_timeout=3600)

documents = SimpleDirectoryReader("./docs/docs/examples/data/paul_graham").load_data()

if os.path.exists("./chroma_test"):
  db = chromadb.PersistentClient(path="./chroma_test")
  chroma_collection = db.get_or_create_collection("chroma_test")
  vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

  index = VectorStoreIndex.from_vector_store(vector_store)
else:
  db = chromadb.PersistentClient(path="./chroma_test")
  chroma_collection = db.get_or_create_collection("chroma_test")
  vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
  storage_context = StorageContext.from_defaults(vector_store=vector_store)

  index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)


retriever = index.as_retriever()
nodes = retriever.retrieve("What happened at Viaweb?")
print(nodes[0].score)
print(nodes[0].text)
print(nodes[1].score)
print(nodes[1].text)

Both retrieved chunks mention viaweb

bbavquant

I checked in the sqlite data base generated by Chroma, and the interesting thing is that when everything is left by default, the open AI embedding is used, leading to an embedding 1536 dimensions. Specifying Nomic as the embedding, this dimension changes (correctly) to 672. The default embedding of chromadb is also an embedding that generates 672 dimensions, i.e. all-MiniLM-L6-v2. However, while the dimensions are the same (no error message!!), I suspect the ridiculous similarity scores indicate a mix up between Nomic that is specified by Llamaindex and all-MiniLM-L6-v2 used by ChromaDB, while for some reason when using the default, i.e. OpenAI Ada w/ 1536 dimensions, that is correctly communicated to ChromaDB during inference, leading to 'good' similarity scores. Do you have any way to look into this?

bbavquant

....Interesting enough, when I use a Huggingface embedding 'sentence-transformers/paraphrase-MiniLM-L6-v2', then I also get good results!! Interestingly, this is the default embedding from ChromaDB, so fits with my earlier hypothesis that somehow the alternative local embedding is not properly communicated to ChromaDB during retrieval

Add a reply

Find answers from the community

I am trying to use llamaindex for a