Find answers from the community

Updated 3 months ago

Embed model

Just a bit confused on what model this embed_model is?
L
d
18 comments
Embedding models are specifically designed to take text and create a numerical representation of it (I.e. a vector)

Then when you query something, the query text is embedded, and using cosine similarity the most similar nodes can be retrieved and sent to the LLM as context to answer that query

In the link I sent, it downloads and runs a local embedding model from huggingface
So its not a specific LLM per say like mpt-7b-instruct or stablelm?
Nope, they are completely separate entities
For example, by default, that huggingface embeddings code is downloading this model
https://huggingface.co/sentence-transformers/all-mpnet-base-v2
So I am following this https://gpt-index.readthedocs.io/en/latest/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.html

Why in this dont you guys use the stablelm embedings?

Sorry if this all is on the docs
StableLM doesn't have embeddings (or at least the last time I checked lol)

Embeddings models are trained specifically to be good at creating representations of text
LLMs (like stabeLM) are trained to be good at generating text
They don't interact at all either really. The embeddings are just a way to retrieve the most relevant text to help the LLM answer a question
Okay I think I get that piece, Im just confused to how yours worked but i copied and pasted everything into a notebook and mine didnt work
and you dont define a openai key anywhere, unless thats implied
Right, it's meant to be a sort of example of setting up the predictor. I had the openai key set in the background πŸ˜…
Sorry if that was unclear
Ah okay that makes sense
One last question
So in order to use the vector store the model must have some corresponding emebdings?
Pretty much. As in, in order to use the vector store, you need to have an embed_model setup (whether it's from openai, hughingface, or something else)
and in this context as long as this library can go on the huggingface page and grab the tokenizer it should be able to get embeddings for the text?
Yea exactly 🫑
Add a reply
Sign up and join the conversation on Discord