Find answers from the community

Updated 6 months ago

How to use a custom embed model for

At a glance

The community member is trying to use a custom embedding model (sentence-transformers/all-mpnet-base-v2) with 768 dimensions for the Supabase Vector Store, which by default uses OpenAI's text-embedding-ada-002 with 1536 dimensions. The community member is encountering an error stating that the expected dimension is 1536, not 768.

The comments suggest that the community member should try setting the dimension to 768 as an integer type, as mentioned in the Supabase documentation. However, this did not work. The community member also verified that the embedding model they are using (all-mpnet-base-v2) supports the 768 dimension.

The community members are still trying to figure out the issue and have not found a clear solution yet.

Useful resources
How to use a custom embed model for Supabase Vector Store (the default one is OpenAI’s text-embedding-ada-002, having embedding dimension 1536).
I want to use the sentence-transformers/all-mpnet-base-v2, having embedding dimension 768.

code:
Plain Text
def get_query_engine_supabase(llm, filename):
    # use Huggingface embeddings

    print("-----LOGGING----- start query_engine - SUPABASE")
    embed_model = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-mpnet-base-v2"
    )
    # create a service context
    service_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
    )

    # set_global_service_context(service_context)

    # # load documents
    documents = SimpleDirectoryReader(
        input_files = [f"./docs/{filename}"]
    ).load_data()

    DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>" # I HAVE THESE SET

    print("-----LOGGING----- initializing vector_store")

    vector_store = SupabaseVectorStore(
        postgres_connection_string=DB_CONNECTION, 
        collection_name='reviewIndexes',
        dimension='768',
    )
    # TRIED dimension=768 ABOVE, NO LUCK

    print("-----LOGGING----- initialized vector_store")

    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    print("-----LOGGING----- initialized storage_context") # IT DOES EXECUTE TILL HERE

    index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)

    print("-----LOGGING----- generated index:",index)

    # set up query engine
    query_engine = index.as_query_engine(
        streaming=True,
        similarity_top_k=1
    )
    return query_engine

ERROR:
Plain Text
 raise ValueError('expected %d dimensions, not %d' % (dim, len(value)))
sqlalchemy.exc.StatementError: (builtins.ValueError) expected 1536 dimensions, not 768
[SQL: INSERT INTO vecs."reviewIndexes" (id, vec, metadata) VALUES (%(id_m0)s, %(vec_m0)s,
i
W
L
4 comments
Plain Text
 vector_store = SupabaseVectorStore(
        postgres_connection_string=DB_CONNECTION, 
        collection_name='reviewIndexes',
        dimension=768,
    )


Data type mentioned in the docs says it to be int type.
Can you try with this once?
Also do verify if the embedding model that you are trying with support this dimension or not.

https://github.com/run-llama/llama_index/blob/9ba693d4913281795675e08bf5658a5ceb8f4ab4/llama_index/vector_stores/supabase.py#L46


For embedding model ref: https://huggingface.co/spaces/mteb/leaderboard
the docs says it to be int type
Tried with dimension as int, no luck.

I'm using HuggingFace's all-mpnet-base-v2 model, having 768 embedding dims.

Supabase Vector store by default uses OpenAI's text-embedding-ada-002, having embedding dimension 1536

I tried adding the dimension to SupabaseVectorStore as LlamaIndex does mention it in the docs(#), and passed service_context into storage_context with the llm (Llama2) and embed model (all-mpnet-base-v2) (refer code)

/# https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/SupabaseVectorIndexDemo.html#create-an-index-backed-by-supabase-s-vector-store
Do you get the same error? Or whats the exact issue? Setting dimension=768 is the right thing to do (but you'll have to make sure it's a new table/collection as well)
Add a reply
Sign up and join the conversation on Discord