Find answers from the community

Updated last year

Embeddings

At a glance

A community member ran into issues with the custom embeddings code from the GPT Index documentation, encountering errors related to the abstract class InstructorEmbeddings. The community members provided suggestions, such as setting self._model after calling super().__init__(), and checking the source code of existing embeddings like the one in the LlamaIndex repository. They also noted that the example code may be out of date and needs updating. The community members continued to troubleshoot the issue, trying different approaches, but encountered further errors related to missing fields in the InstructorEmbeddings class. Eventually, one community member provided an updated notebook with the custom embeddings example, and another community member asked about the default embedding model used by the LlamaIndex library.

Useful resources
hey all,

I ran the custom embeddings code here as is: https://gpt-index.readthedocs.io/en/stable/examples/embeddings/custom_embeddings.html

and I got this:

Plain Text
Traceback (most recent call last):
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 40, in <module>
    embed_model=InstructorEmbeddings(embed_batch_size=2), chunk_size=512
TypeError: Can't instantiate abstract class InstructorEmbeddings with abstract methods _aget_query_embedding, class_name


then I put in stub implementations:

Plain Text
def class_name(self) -> str:
        return "InstructorEmbeddings"
    
async def _aget_query_embedding(self, query: str) -> List[float]:
    return self._get_query_embedding(query)



and got this:

Plain Text
Traceback (most recent call last):
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 46, in <module>
    embed_model=InstructorEmbeddings(embed_batch_size=2), chunk_size=512
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 19, in __init__
    self._model = INSTRUCTOR(instructor_model_name)
  File "pydantic/main.py", line 357, in pydantic.main.BaseModel.__setattr__
ValueError: "InstructorEmbeddings" object has no field "_model"
L
B
25 comments
Easieast fix is you need to set self._model after running super().init(...)
Sorry the example is slightly out of date, need to update that
You could also check out the source code from some existing embeddings for an example

https://github.com/jerryjliu/llama_index/blob/main/llama_index/embeddings/langchain.py
in this repo, is self._model being set?

Plain Text
super().__init__(
            embed_batch_size=embed_batch_size,
            callback_manager=callback_manager,
            model_name=model_name,
        )

    @classmethod
    def class_name(cls) -> str:
        """Get class name."""
        return "LangchainEmbedding"
I don't see it anywhere
bruh sorry for pleb question, I'm literally learning Python on the fly lolll
ah I could put this:

Plain Text
self._model = INSTRUCTOR(instructor_model_name)


after super().init() I getcha
lemme try that real quick
didn't work 😦

Plain Text
class InstructorEmbeddings(BaseEmbedding):
    def __init__(
        self,
        instructor_model_name: str = "hkunlp/instructor-large",
        instruction: str = "Represent a document for semantic search:",
        **kwargs: Any,
    ) -> None:
        self._instruction = instruction
        super().__init__(**kwargs)
        self._model = INSTRUCTOR(instructor_model_name)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, query]])
        return embeddings[0]
oh wait forgot to implement the stubs again, one sec
ok RIP new error:

Plain Text
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 18, in __init__
    self._instruction = instruction
  File "pydantic/main.py", line 357, in pydantic.main.BaseModel.__setattr__
ValueError: "InstructorEmbeddings" object has no field "_instruction"
lemme just nuke that field then...
led me back to this:

Plain Text
  File "pydantic/main.py", line 357, in pydantic.main.BaseModel.__setattr__
ValueError: "InstructorEmbeddings" object has no field "_model"
with this code:

Plain Text
class InstructorEmbeddings(BaseEmbedding):
    def __init__(
        self,
        instructor_model_name: str = "hkunlp/instructor-large",
        **kwargs: Any,
    ) -> None:
        super().__init__(**kwargs)
        self._model = INSTRUCTOR(instructor_model_name)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, query]])
        return embeddings[0]
I'll go check out that repo you gave and try to extend the BaseEmbedding class that way 🫑
lemme know if you get around to updating that example please πŸ™
I'm catching up on part 4 of
of 'bottoms up tutorial' to be able to get through part 5
appreciate all your πŸ”₯ content and help GRC! πŸ”₯ πŸ™ ❀️ 🫑
my πŸ‘‘ provides πŸ™
Plain Text
@classmethod
    def from_defaults(
        cls,
        llm_predictor: Optional[BaseLLMPredictor] = None,
        llm: Optional[LLMType] = "default",
        prompt_helper: Optional[PromptHelper] = None,
        embed_model: Optional[EmbedType] = "default",


Ser I'm looking at this method for creating the service_context, particularly the string "default"

Does this mean that if I don't pass in my own embed model, the default will be from openAI?

Trying to determine if everytime I run this line:

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

The index is actually just a bunch of vectors with numbers regardless if I pass in an embed model?
Yes the default is openai

There's two main functions to pick the emebdding model and the llm, resolve_embed_model() and resolve_llm()
And yeah, a vector index is essentially a map of vectors/numbers to their source text
thank you! ❀️
Add a reply
Sign up and join the conversation on Discord