Find answers from the community

Updated 3 months ago

How to use a custom embed model for

How to use a custom embed model for Supabase Vector Store (the default one is OpenAI’s text-embedding-ada-002, having embedding dimension 1536).
I want to use the sentence-transformers/all-mpnet-base-v2, having embedding dimension 768.

code:
Plain Text
def get_query_engine_supabase(llm, filename):
    # use Huggingface embeddings

    print("-----LOGGING----- start query_engine - SUPABASE")
    embed_model = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-mpnet-base-v2"
    )
    # create a service context
    service_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
    )

    # set_global_service_context(service_context)

    # # load documents
    documents = SimpleDirectoryReader(
        input_files = [f"./docs/{filename}"]
    ).load_data()

    DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>" # I HAVE THESE SET

    print("-----LOGGING----- initializing vector_store")

    vector_store = SupabaseVectorStore(
        postgres_connection_string=DB_CONNECTION, 
        collection_name='reviewIndexes',
        dimension='768',
    )
    # TRIED dimension=768 ABOVE, NO LUCK

    print("-----LOGGING----- initialized vector_store")

    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    print("-----LOGGING----- initialized storage_context") # IT DOES EXECUTE TILL HERE

    index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)

    print("-----LOGGING----- generated index:",index)

    # set up query engine
    query_engine = index.as_query_engine(
        streaming=True,
        similarity_top_k=1
    )
    return query_engine

ERROR:
Plain Text
 raise ValueError('expected %d dimensions, not %d' % (dim, len(value)))
sqlalchemy.exc.StatementError: (builtins.ValueError) expected 1536 dimensions, not 768
[SQL: INSERT INTO vecs."reviewIndexes" (id, vec, metadata) VALUES (%(id_m0)s, %(vec_m0)s,
i
W
L
4 comments
Plain Text
 vector_store = SupabaseVectorStore(
        postgres_connection_string=DB_CONNECTION, 
        collection_name='reviewIndexes',
        dimension=768,
    )


Data type mentioned in the docs says it to be int type.
Can you try with this once?
Also do verify if the embedding model that you are trying with support this dimension or not.

https://github.com/run-llama/llama_index/blob/9ba693d4913281795675e08bf5658a5ceb8f4ab4/llama_index/vector_stores/supabase.py#L46


For embedding model ref: https://huggingface.co/spaces/mteb/leaderboard
the docs says it to be int type
Tried with dimension as int, no luck.

I'm using HuggingFace's all-mpnet-base-v2 model, having 768 embedding dims.

Supabase Vector store by default uses OpenAI's text-embedding-ada-002, having embedding dimension 1536

I tried adding the dimension to SupabaseVectorStore as LlamaIndex does mention it in the docs(#), and passed service_context into storage_context with the llm (Llama2) and embed model (all-mpnet-base-v2) (refer code)

/# https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/SupabaseVectorIndexDemo.html#create-an-index-backed-by-supabase-s-vector-store
Do you get the same error? Or whats the exact issue? Setting dimension=768 is the right thing to do (but you'll have to make sure it's a new table/collection as well)
Add a reply
Sign up and join the conversation on Discord