Find answers from the community

Updated 11 months ago

I implemented this example:

I implemented this example:

except i am using the index as chat engine: https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom.html#example-using-a-custom-llm-model-advanced
Plain Text
# chat_engine = index.as_chat_engine()
chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt=system_prompt,
    service_context=service_context
)

response = chat_engine.chat("Tell me a joke.")
print(f"Agent: {response}")

but when i put in an input it returns no output and gives error:
Plain Text
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

anyone know why this might be happening?
edit: now its giving error ValueError: shapes (384,) and (1536,) not aligned: 384 (dim 0) != 1536 (dim 0)
L
s
a
25 comments
that second error means embeddings are being created with two different models.

I'm not sure which LLM you are using (sounds like huggingface), but huggingface is famous for zero outputs when the input gets too big
Can you share the LLM implementation?
Plain Text
set_global_tokenizer(
    AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode
)

llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.7, "do_sample": False},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b",
    model_name="StabilityAI/stablelm-tuned-alpha-3b",#hbacard/Nous-Hermes-Llama2-13b-GGUF
    device_map="auto",
    stopping_ids=[50278, 50279, 50277, 1, 0],
    tokenizer_kwargs={"max_length": 4096},
    # uncomment this if using CUDA to reduce memory usage
    # model_kwargs={"torch_dtype": torch.float16}
)
# use Huggingface embeddings
from llama_index.embeddings import HuggingFaceEmbedding
# intfloat/e5-mistral-7b-instruct
# BAAI/bge-small-en-v1.5
# BAAI/bge-large-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="intfloat/e5-mistral-7b-instruct")
service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=embed_model
)
set_global_service_context(service_context)
)
I had embded_model set to "local" before. I just changed it to this one ^ and am trying again
If you switch embedding models, just make sure you re-create the entire index
thats gotta be it
i'm loading one from disk I pre made
Yea so that will fix the dim error
I'm not totally sure why the output is empty πŸ€” If you are running on CPU though, it will be very slow though
Thanks! I would not have known to look for that
it should be cuda but idk how to check if my pip install put the cuda stuff
Plain Text
import torch
print(torch.cuda.is_available())`


That should show if torch sees your gpu
Hi @segfault I am Charlene a very new newbie in LLM and all of this. I am struggling to understand what embedding model I should chose for "StabilityAI/stablelm-tuned-alpha-3b". I searched the web many times but was still confused. I thought the llm has to use the same embedding as the embed_model but here you are using something different "intfloat/e5-mistral-7b-instruct". Would you mind sharing how to know if two different model's embedding are compatible? Thank you!
the LLM and embedding model are completely unrelated. You could have any combination πŸ™‚
Oh @Logan M thanks for the quick reply! I read """By default LlamaIndex uses text-embedding-ada-002, which is the default embedding used by OpenAI. If you are using different LLMs you will often want to use different embeddings.""" here and thus think there must be some rules.

https://docs.llamaindex.ai/en/stable/understanding/indexing/indexing.html.

So technically, I can use any embedding+llm combination, despite it maps words into vector space differently? I guess I am failing to understand if embed_model is mapping to a difference space as the LLM does, how does the LLM know what is relevant?
oh, thats a weird senetence to be in the docs. Will make sure that gets removed haha
embedding models are just used to represent and retireve text
once you have the text, it goes to the LLM
so you can use any embedding model with any LLM
got it! Thank you!
Add a reply
Sign up and join the conversation on Discord