Just a quick question if we create a

At a glance

Just a quick question, if we create a ServiceContext.from_defaults() function to pass in a different sentencetransformers model, will VectorStoreIndex.from_documents() function create the embedding data using said different model when passing in documents and that ServiceContext object? specifically using chromadb

17 comments

WWhiteFang_Jr

Yes if you want to use different embed model other than default OpenAI one then you need to pass it in the service context and all the embeddings will be created using the newly defined embedding model.

wwoojim

just to make sure

wwoojim

chroma_collection = db.get_or_create_collection("chromadbcollectionname")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(embed_model=embed_model)

wwoojim

in this code where im using chroma collection with a name

wwoojim

even if i don't provide the embedding model in db.get_or_create() function it will create the embedding data for chromadb correct?

wwoojim

when it's creating (not using get)

wwoojim

index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, service_context=service_context, show_progress=False
)

with this final code it will add in embedding data?

WWhiteFang_Jr

I think this method creates the collection in the DB.

wwoojim

right

wwoojim

for chroma api it creates the db and generally when using a custom model, you have to pass it in

WWhiteFang_Jr

When code runs

Plain Text

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context, show_progress=False
)

this part the embeddings will be generated

WWhiteFang_Jr

And since you are passing embed model from your side so it will use that only to create embeddings

wwoojim

okay thanks

wwoojim

I couldn't tell if llama index was using the embed model passed into service_context or if it was using default chromadb embed model in storage_context (if the chromaVectorStore was built without passing in embedding model)

WWhiteFang_Jr

You could try putting your service context as global. That way if in storage context it checks for service context then it will take your globally defined one.

Rest I think it will be better to check the code to verify if it uses the defined model or uses itw own.

WWhiteFang_Jr

Also I found that you can setup your llama debug handler and check info for embedding in there.

https://gpt-index.readthedocs.io/en/latest/examples/callbacks/LlamaDebugHandler.html

Maybe when you are checking for CBEventType.EMBEDDING you might get info on which embed model is being used. Just a hunch though 😅

LLogan M

yea, we always use the service context model for embedding, never the database.

Add a reply

Find answers from the community

Just a quick question if we create a