Hey team
I used custom model to create multilingual embeddings for DOcumentSummaryIndex:
from langchain.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings
embed_model = embeddings = SentenceTransformerEmbeddings(model_name="intfloat/multilingual-e5-large")
service_context = ServiceContext.from_defaults(
callback_manager=callback_manager, llm=llm, embed_model=embed_model
)
urls = get_urls_from_xlsx_file(filepath)
documents = reader.load_data(urls)
response_synthesizer = get_response_synthesizer(use_async=True)
index = GPTDocumentSummaryIndex.from_documents(
documents,
service_context=service_context,
response_synthesizer=response_synthesizer,
show_progress=True,
)
index.storage_context.persist(save_index_path)
howerer, when I tried to query it I got an error that product = np.dot(embedding1, embedding2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: shapes (1536,) and (1024,) not aligned: 1536 (dim 0) != 1024 (dim 0)
Query code:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])
embed_model = SentenceTransformerEmbeddings(model_name="intfloat/multilingual-e5-large")
service_context = ServiceContext.from_defaults(
callback_manager=callback_manager, llm=llm, embed_model=embed_model
)
index = load_index(index_path)
query_engine = index.as_query_engine(
verbose=True,
retriever_mode="embedding",
service_context=service_context
)
while True:
prompt = input("Type prompt...")
response = query_engine.query(prompt)
print(response)
Can someone please point me out to the issue root cause?