Find answers from the community

Updated last year

Just a quick question if we create a

Just a quick question, if we create a ServiceContext.from_defaults() function to pass in a different sentencetransformers model, will VectorStoreIndex.from_documents() function create the embedding data using said different model when passing in documents and that ServiceContext object? specifically using chromadb
W
w
L
17 comments
Yes if you want to use different embed model other than default OpenAI one then you need to pass it in the service context and all the embeddings will be created using the newly defined embedding model.
just to make sure
chroma_collection = db.get_or_create_collection("chromadbcollectionname")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(embed_model=embed_model)
in this code where im using chroma collection with a name
even if i don't provide the embedding model in db.get_or_create() function it will create the embedding data for chromadb correct?
when it's creating (not using get)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, service_context=service_context, show_progress=False
)

with this final code it will add in embedding data?
I think this method creates the collection in the DB.
for chroma api it creates the db and generally when using a custom model, you have to pass it in
When code runs
Plain Text
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context, show_progress=False
)

this part the embeddings will be generated
And since you are passing embed model from your side so it will use that only to create embeddings
okay thanks
I couldn't tell if llama index was using the embed model passed into service_context or if it was using default chromadb embed model in storage_context (if the chromaVectorStore was built without passing in embedding model)
You could try putting your service context as global. That way if in storage context it checks for service context then it will take your globally defined one.

Rest I think it will be better to check the code to verify if it uses the defined model or uses itw own.
Also I found that you can setup your llama debug handler and check info for embedding in there.

https://gpt-index.readthedocs.io/en/latest/examples/callbacks/LlamaDebugHandler.html

Maybe when you are checking for CBEventType.EMBEDDING you might get info on which embed model is being used. Just a hunch though πŸ˜…
yea, we always use the service context model for embedding, never the database.
Add a reply
Sign up and join the conversation on Discord