LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

Embeddings

Embeddings

At a glance

A community member ran into issues with the custom embeddings code from the GPT Index documentation, encountering errors related to the abstract class InstructorEmbeddings. The community members provided suggestions, such as setting self._model after calling super().__init__(), and checking the source code of existing embeddings like the one in the LlamaIndex repository. They also noted that the example code may be out of date and needs updating. The community members continued to troubleshoot the issue, trying different approaches, but encountered further errors related to missing fields in the InstructorEmbeddings class. Eventually, one community member provided an updated notebook with the custom embeddings example, and another community member asked about the default embedding model used by the LlamaIndex library.

Useful resources

·

hey all,

I ran the custom embeddings code here as is: https://gpt-index.readthedocs.io/en/stable/examples/embeddings/custom_embeddings.html

and I got this:

Plain Text

Traceback (most recent call last):
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 40, in <module>
    embed_model=InstructorEmbeddings(embed_batch_size=2), chunk_size=512
TypeError: Can't instantiate abstract class InstructorEmbeddings with abstract methods _aget_query_embedding, class_name

then I put in stub implementations:

Plain Text

def class_name(self) -> str:
        return "InstructorEmbeddings"
    
async def _aget_query_embedding(self, query: str) -> List[float]:
    return self._get_query_embedding(query)

and got this:

Plain Text

Traceback (most recent call last):
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 46, in <module>
    embed_model=InstructorEmbeddings(embed_batch_size=2), chunk_size=512
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 19, in __init__
    self._model = INSTRUCTOR(instructor_model_name)
  File "pydantic/main.py", line 357, in pydantic.main.BaseModel.__setattr__
ValueError: "InstructorEmbeddings" object has no field "_model"

L

B

25 comments

Easieast fix is you need to set self._model after running super().init(...)

Sorry the example is slightly out of date, need to update that

You could also check out the source code from some existing embeddings for an example

https://github.com/jerryjliu/llama_index/blob/main/llama_index/embeddings/langchain.py

in this repo, is self._model being set?

Plain Text

super().__init__(
            embed_batch_size=embed_batch_size,
            callback_manager=callback_manager,
            model_name=model_name,
        )

    @classmethod
    def class_name(cls) -> str:
        """Get class name."""
        return "LangchainEmbedding"

I don't see it anywhere

bruh sorry for pleb question, I'm literally learning Python on the fly lolll

ah I could put this:

Plain Text

self._model = INSTRUCTOR(instructor_model_name)

after super().init() I getcha

lemme try that real quick

didn't work 😦

Plain Text

class InstructorEmbeddings(BaseEmbedding):
    def __init__(
        self,
        instructor_model_name: str = "hkunlp/instructor-large",
        instruction: str = "Represent a document for semantic search:",
        **kwargs: Any,
    ) -> None:
        self._instruction = instruction
        super().__init__(**kwargs)
        self._model = INSTRUCTOR(instructor_model_name)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, query]])
        return embeddings[0]

oh wait forgot to implement the stubs again, one sec

ok RIP new error:

Plain Text

  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 18, in __init__
    self._instruction = instruction
  File "pydantic/main.py", line 357, in pydantic.main.BaseModel.__setattr__
ValueError: "InstructorEmbeddings" object has no field "_instruction"

lemme just nuke that field then...

led me back to this:

Plain Text

  File "pydantic/main.py", line 357, in pydantic.main.BaseModel.__setattr__
ValueError: "InstructorEmbeddings" object has no field "_model"

with this code:

Plain Text

class InstructorEmbeddings(BaseEmbedding):
    def __init__(
        self,
        instructor_model_name: str = "hkunlp/instructor-large",
        **kwargs: Any,
    ) -> None:
        super().__init__(**kwargs)
        self._model = INSTRUCTOR(instructor_model_name)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, query]])
        return embeddings[0]

I'll go check out that repo you gave and try to extend the BaseEmbedding class that way 🫡

lemme know if you get around to updating that example please 🙏

I'm catching up on part 4 of

of 'bottoms up tutorial' to be able to get through part 5

appreciate all your 🔥 content and help GRC! 🔥 🙏 ❤️ 🫡

Updated the notebook here: https://github.com/jerryjliu/llama_index/blob/13fdf5363872e297cb5b0f863daed0e89047bfa8/docs/examples/embeddings/custom_embeddings.ipynb

my 👑 provides 🙏

Plain Text

@classmethod
    def from_defaults(
        cls,
        llm_predictor: Optional[BaseLLMPredictor] = None,
        llm: Optional[LLMType] = "default",
        prompt_helper: Optional[PromptHelper] = None,
        embed_model: Optional[EmbedType] = "default",

Ser I'm looking at this method for creating the service_context, particularly the string "default"

Does this mean that if I don't pass in my own embed model, the default will be from openAI?

Trying to determine if everytime I run this line:

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

The index is actually just a bunch of vectors with numbers regardless if I pass in an embed model?

Yes the default is openai

There's two main functions to pick the emebdding model and the llm, resolve_embed_model() and resolve_llm()

And yeah, a vector index is essentially a map of vectors/numbers to their source text

thank you! ❤️

Add a reply

Sign up and join the conversation on Discord