llama-index will embed your data before inserting, skipping any embedding model used by chroma
this all happens automatically assuming you used the vector store integration
yes, i realize that... Unfortunately llamaindex per default uses the openai embeddings, which means regular api call. I would like to establish a RAG which uses the local machine for embedding and llm calls, while at the same time also storing the vector database locally. Chroma works great for storing vectors locally, and also works in combination with llamaindex - however, all the embedding calls and llm calls go to OpenAI. There is a hugging face interface, which allows to call hugging face but that turns out to create the same issue (remote calls). Since llama3 is fairly fast running it with Ollama locally, I am looking for a way to create embeddings locally and also use llama3 for llm calls..... Any suggestion / help highly appreciated!!
we have multiple embedding integrations. Local huggingface embeddings, embeddings through ollama, etc.
Yes, but there appears to be a mismatch when I use chromadb to make the vector store persistent due to the default embedding used by chroma db (all-MiniLM-L6-v2) vs my settings in llama index (Settings.llm = Ollama(model=‘llama3), Settings.embed_model=OllamaEmbedding(model_name=‘nomic-embed-text, base_url =….)… in any case, the similarities returned are way off. Any help for how to get this work greatly appreciated
Llama index embeds anything before inserting, and also embed the query for you, using the model from settings
Assuming you both created and queried the index with the same embedding model, it should be fine (and works fine for me 🤷♂️ )
Thanks for engaging with me, Logan! yes, llamaindex by itself works.... however, if i try to use an Ollama based embedding (Nomic) for RAG, somehow things don't work anymore. My understanding is that Chroma DB has a different default embedding. I enclose three examples: 1) openai embeddings and openai llm: Works!, 2) openai embeddings and local llama3 llm w/ Ollama: Works, 3) local Nomic embedding w/ Ollama and local llama3 llm w/ Ollama: DOES NOT WORK. Similarity scores for 1) and 2) are the expected .4, .6 etc., similarity scores for 3) are 1- e-286 .... For me a clear indication that something is wrong with the embeddings. By the way, things also work when using Hugging Face embeddings, but I am interested in using the Nomic embedding via Ollama. Any help greatly appreciated, haven't been able to find a solution for a week.....
I'm not sure man, seems like a bug with chroma itself, especially since it works fine with other embedding models
while the scores are wonky, it seems to be retrieving the correct nodes in my tests
import os
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core import VectorStoreIndex, Settings, SimpleDirectoryReader, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
Settings.llm = Ollama(model="llama3", request_timeout=3600)
documents = SimpleDirectoryReader("./docs/docs/examples/data/paul_graham").load_data()
if os.path.exists("./chroma_test"):
db = chromadb.PersistentClient(path="./chroma_test")
chroma_collection = db.get_or_create_collection("chroma_test")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(vector_store)
else:
db = chromadb.PersistentClient(path="./chroma_test")
chroma_collection = db.get_or_create_collection("chroma_test")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
retriever = index.as_retriever()
nodes = retriever.retrieve("What happened at Viaweb?")
print(nodes[0].score)
print(nodes[0].text)
print(nodes[1].score)
print(nodes[1].text)
Both retrieved chunks mention viaweb
I checked in the sqlite data base generated by Chroma, and the interesting thing is that when everything is left by default, the open AI embedding is used, leading to an embedding 1536 dimensions. Specifying Nomic as the embedding, this dimension changes (correctly) to 672. The default embedding of chromadb is also an embedding that generates 672 dimensions, i.e. all-MiniLM-L6-v2. However, while the dimensions are the same (no error message!!), I suspect the ridiculous similarity scores indicate a mix up between Nomic that is specified by Llamaindex and all-MiniLM-L6-v2 used by ChromaDB, while for some reason when using the default, i.e. OpenAI Ada w/ 1536 dimensions, that is correctly communicated to ChromaDB during inference, leading to 'good' similarity scores. Do you have any way to look into this?
....Interesting enough, when I use a Huggingface embedding 'sentence-transformers/paraphrase-MiniLM-L6-v2', then I also get good results!! Interestingly, this is the default embedding from ChromaDB, so fits with my earlier hypothesis that somehow the alternative local embedding is not properly communicated to ChromaDB during retrieval