Find answers from the community

Updated last year

Embeddings

hey all,

I ran the custom embeddings code here as is: https://gpt-index.readthedocs.io/en/stable/examples/embeddings/custom_embeddings.html

and I got this:

Plain Text
Traceback (most recent call last):
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 40, in <module>
    embed_model=InstructorEmbeddings(embed_batch_size=2), chunk_size=512
TypeError: Can't instantiate abstract class InstructorEmbeddings with abstract methods _aget_query_embedding, class_name


then I put in stub implementations:

Plain Text
def class_name(self) -> str:
        return "InstructorEmbeddings"
    
async def _aget_query_embedding(self, query: str) -> List[float]:
    return self._get_query_embedding(query)



and got this:

Plain Text
Traceback (most recent call last):
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 46, in <module>
    embed_model=InstructorEmbeddings(embed_batch_size=2), chunk_size=512
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 19, in __init__
    self._model = INSTRUCTOR(instructor_model_name)
  File "pydantic/main.py", line 357, in pydantic.main.BaseModel.__setattr__
ValueError: "InstructorEmbeddings" object has no field "_model"
L
B
25 comments
Easieast fix is you need to set self._model after running super().init(...)
Sorry the example is slightly out of date, need to update that
You could also check out the source code from some existing embeddings for an example

https://github.com/jerryjliu/llama_index/blob/main/llama_index/embeddings/langchain.py
in this repo, is self._model being set?

Plain Text
super().__init__(
            embed_batch_size=embed_batch_size,
            callback_manager=callback_manager,
            model_name=model_name,
        )

    @classmethod
    def class_name(cls) -> str:
        """Get class name."""
        return "LangchainEmbedding"
I don't see it anywhere
bruh sorry for pleb question, I'm literally learning Python on the fly lolll
ah I could put this:

Plain Text
self._model = INSTRUCTOR(instructor_model_name)


after super().init() I getcha
lemme try that real quick
didn't work 😦

Plain Text
class InstructorEmbeddings(BaseEmbedding):
    def __init__(
        self,
        instructor_model_name: str = "hkunlp/instructor-large",
        instruction: str = "Represent a document for semantic search:",
        **kwargs: Any,
    ) -> None:
        self._instruction = instruction
        super().__init__(**kwargs)
        self._model = INSTRUCTOR(instructor_model_name)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, query]])
        return embeddings[0]
oh wait forgot to implement the stubs again, one sec
ok RIP new error:

Plain Text
  File "/home/bi-ai/ai/bottoms-up-embeddings/main.py", line 18, in __init__
    self._instruction = instruction
  File "pydantic/main.py", line 357, in pydantic.main.BaseModel.__setattr__
ValueError: "InstructorEmbeddings" object has no field "_instruction"
lemme just nuke that field then...
led me back to this:

Plain Text
  File "pydantic/main.py", line 357, in pydantic.main.BaseModel.__setattr__
ValueError: "InstructorEmbeddings" object has no field "_model"
with this code:

Plain Text
class InstructorEmbeddings(BaseEmbedding):
    def __init__(
        self,
        instructor_model_name: str = "hkunlp/instructor-large",
        **kwargs: Any,
    ) -> None:
        super().__init__(**kwargs)
        self._model = INSTRUCTOR(instructor_model_name)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, query]])
        return embeddings[0]
I'll go check out that repo you gave and try to extend the BaseEmbedding class that way 🫑
lemme know if you get around to updating that example please πŸ™
I'm catching up on part 4 of
of 'bottoms up tutorial' to be able to get through part 5
appreciate all your πŸ”₯ content and help GRC! πŸ”₯ πŸ™ ❀️ 🫑
my πŸ‘‘ provides πŸ™
Plain Text
@classmethod
    def from_defaults(
        cls,
        llm_predictor: Optional[BaseLLMPredictor] = None,
        llm: Optional[LLMType] = "default",
        prompt_helper: Optional[PromptHelper] = None,
        embed_model: Optional[EmbedType] = "default",


Ser I'm looking at this method for creating the service_context, particularly the string "default"

Does this mean that if I don't pass in my own embed model, the default will be from openAI?

Trying to determine if everytime I run this line:

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

The index is actually just a bunch of vectors with numbers regardless if I pass in an embed model?
Yes the default is openai

There's two main functions to pick the emebdding model and the llm, resolve_embed_model() and resolve_llm()
And yeah, a vector index is essentially a map of vectors/numbers to their source text
thank you! ❀️
Add a reply
Sign up and join the conversation on Discord