Can someone offer any insight why my specified HF embeddings do not work ? I use one .py file for indexing with the following content (relevant to the question) and the same pgvector DB for retrieving the data.
App part:
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")
vector_store = PGVectorStore.from_params(
database=db_name,
host=url.host,
password=db_password,
port=url.port,
user=db_user,
table_name=table_name,
embed_dim=1024, # HF embedding dimension
)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
Separate indexer py :
vector_store_connection_string = f"postgresql://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}"
url = make_url(vector_store_connection_string)
vector_store = PGVectorStore.from_params(
database=db_name,
host=url.host,
password=db_password,
port=url.port,
user=db_user,
table_name=table_name,
embed_dim=1024, # HF embedding dimension
)
# Create the storage context and index
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, show_progress=True
)
print("Indexing complete. Data stored in PGVector.")
The error :
DataError: (psycopg2.errors.DataException) different vector dimensions 1536 and 1024 [SQL: SELECT public.data_building_permits_data.id, public.data_building_permits_data.node_id
Any insight is appreciated.