# chat_engine = index.as_chat_engine() chat_engine = index.as_chat_engine( chat_mode="context", memory=memory, system_prompt=system_prompt, service_context=service_context ) response = chat_engine.chat("Tell me a joke.") print(f"Agent: {response}")
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
ValueError: shapes (384,) and (1536,) not aligned: 384 (dim 0) != 1536 (dim 0)
set_global_tokenizer( AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode ) llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b",#hbacard/Nous-Hermes-Llama2-13b-GGUF device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) # use Huggingface embeddings from llama_index.embeddings import HuggingFaceEmbedding # intfloat/e5-mistral-7b-instruct # BAAI/bge-small-en-v1.5 # BAAI/bge-large-en-v1.5 embed_model = HuggingFaceEmbedding(model_name="intfloat/e5-mistral-7b-instruct") service_context = ServiceContext.from_defaults( chunk_size=1024, llm=llm, embed_model=embed_model ) set_global_service_context(service_context) )
import torch print(torch.cuda.is_available())`
llm
has to use the same embedding as the embed_model
but here you are using something different "intfloat/e5-mistral-7b-instruct"
. Would you mind sharing how to know if two different model's embedding are compatible? Thank you!