Embeddings

At a glance

I'm trying but getting error
ValueError: shapes (1536,) and (768,) not aligned: 1536 (dim 0) != 768 (dim 0)

27 comments

Did you already create an index with openai embeddings and then switch to huggingface?

The embeddings from different models can't mix, or you get this error 👀

LLogan M

Just need to make sure you start with a new index when switching embed_model

WWhiteFang_Jr

No! But found an interesting case. Not sure why it is happening as of now but here it is

Combination of huggingface embedding and OpenAI for response generation works when I'm not storing the generated indexes and then picking out the indexes from the storage

Plain Text

#custom knowledge
from llama_index import (
                        GPTVectorStoreIndex, LangchainEmbedding,
                        StorageContext, load_index_from_storage, SimpleDirectoryReader
                        
                        )
from llama_index import ServiceContext, LLMPredictor
import os
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"

llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, max_tokens=1024, model_name="gpt-3.5-turbo"))

embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512,embed_model=embed_model)

documents = SimpleDirectoryReader(input_files=["path to one doc"]).load_data()
open_index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = open_index.as_query_engine(similarity_top_k=3, service_context=service_context)

response = query_engine.query("summarize this document")
print(response.response)

This works!!!

WWhiteFang_Jr

But at the same time
If I do

Create indexes
store them
then load them

and use the loaded indexes to query

it gives me ValueError: shapes (1536,) and (768,) not aligned: 1536 (dim 0) != 768 (dim 0)

here's the code for the same

Plain Text

#custom knowledge
from llama_index import (
                        GPTVectorStoreIndex, LangchainEmbedding,
                        StorageContext, load_index_from_storage, SimpleDirectoryReader
                        
                        )
from llama_index import ServiceContext, LLMPredictor
import os
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"

llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, max_tokens=1024, model_name="gpt-3.5-turbo"))


embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512,embed_model=embed_model)

documents = SimpleDirectoryReader(input_files=["path to a doc"]).load_data()
open_index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
open_index.storage_context.persist(persist_dir="./hff_Storage")

storage_context = StorageContext.from_defaults(persist_dir="./hff_Storage")

index = load_index_from_storage(storage_context=storage_context)

query_engine = index.as_query_engine(similarity_top_k=3, service_context=service_context)

response = query_engine.query("summarize this document")
print(response.response)

WWhiteFang_Jr

I know I'm loading indexes even though I have it already!

Case is this combination is not working when I'm loading the indexes from the storage context

WWhiteFang_Jr

Could you try this at your end once and let me know if it works or not!
Thanks

AAlexDep

What are the advantages of this bundle and under what data? Thank you.

WWhiteFang_Jr

Its just I'm thinking that my entire data will not be passed to OpenAI services.
In actual setup, I'm going to have around 150 docs on which I will create the indexes. So just seeing if only the response generation part is passed to OpenAI and not the entire vector generation part.

AAlexDep

And what about the consumption of tokens is acceptable?

WWhiteFang_Jr

While generating the response? Yes as it will not be that much in comparison to while generating embeddings

AAlexDep

Yes, when generating the response. Well, with attachments, are there any options to minimize the expense? I'm looking for a middle ground between the cost of the answer, the quality of the answer and the cost of generating attachments 🙂

WWhiteFang_Jr

You could use huggingface models if you want to reduce completely on the token consumption side. But openai responses are considered best among all the llm out there.

That is why I'm trying to see if combining hf and openai works for me or not

AAlexDep

I will test, but the quality of the answers is very important. Otherwise, why all this. I use: nodes, gptvectorstoreindex and SentenceEmbeddingOptimizer.

WWhiteFang_Jr

Sbert embeddings that I'm using is also great. Yes better responses is the key but the data I'll be working with will be huge in numbers. So i just want to check if this can work for me or not

WWhiteFang_Jr

Do check both the scenarios. My issue was once we store the embeddings and then pick it up it fails while generating response

https://discord.com/channels/1059199217496772688/1109093304575983738/1110139260142620742

LLogan M

Yea this should definitely work. Let me see if I have time to debug later today 💪

AAlexDep

Guys, can you throw off the working code later? Thanks

WWhiteFang_Jr

Hey, did any of you were able to try the above code?

AAlexDep

Hi. I haven't had time yet.

LLogan M

I just tried now with a very minimal example, but I wasn't able to reproduce 👀

Plain Text

>>> from langchain.embeddings.huggingface import HuggingFaceEmbeddings
>>> from llama_index import GPTVectorStoreIndex, ServiceContext, StorageContext, load_index_from_storage, Document
>>>
>>> embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
>>> service_context = ServiceContext.from_defaults(embed_model=embed_model)
>>> doc = Document("this is a document lol!")
>>>
>>> new_index = GPTVectorStoreIndex.from_documents([doc], service_context=service_context)
>>> new_index.as_query_engine().query("hello world")
Response(response='\nHello World!', source_nodes=[NodeWithScore(node=Node(text='this is a document lol!', doc_id='7cace66a-1302-41ef-8fa6-98e6cf6feac3', embedding=None, doc_hash='57e74d18803a15a129af5ba1f71081081f50b4e7007689bd4205c0be84063aad', extra_info=None, node_info={'start': 0, 'end': 23}, relationships={<DocumentRelationship.SOURCE: '1'>: '1852954d-a584-4c8c-8f6d-201e901b0765'}), score=0.1624280677241592)], extra_info={'7cace66a-1302-41ef-8fa6-98e6cf6feac3': None})
>>>
>>> new_index.storage_context.persist(persist_dir="./newer")
>>> 
>>> newer_index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./newer"), service_context=service_context)
>>> newer_index.as_query_engine().query("hello world")
Response(response='\nHello World!', source_nodes=[NodeWithScore(node=Node(text='this is a document lol!', doc_id='7cace66a-1302-41ef-8fa6-98e6cf6feac3', embedding=None, doc_hash='57e74d18803a15a129af5ba1f71081081f50b4e7007689bd4205c0be84063aad', extra_info=None, node_info={'start': 0, 'end': 23}, relationships={<DocumentRelationship.SOURCE: '1'>: '1852954d-a584-4c8c-8f6d-201e901b0765'}), score=0.1624280677241592)], extra_info={'7cace66a-1302-41ef-8fa6-98e6cf6feac3': None})
>>>

WWhiteFang_Jr

Yes it worked, And i got the missing part in my code as well, While loading the vectors I was not passing the service_context

WWhiteFang_Jr

Thanks a ton @Logan M

LLogan M

Nice, glad it works! :dotsCATJAM:

WWhiteFang_Jr

In case of default I guess It will work without passing it but since I'm using ChatOpenAI, it was required

AAlexDep

Did your code work ? 🙂

WWhiteFang_Jr

Yes

WWhiteFang_Jr

Had to pass service context while loading from storage if we are not using default service context

Add a reply

Find answers from the community

Embeddings