Hi, I am using llama-2 chat model from

At a glance

Hi, I am using llama-2 chat model from huggingface as my LLM but getting an error that {context_str} is not identified...Here's the code:

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", device = device, cache_folder="local_models")

# Trying huggingface Llama-s chat instead to see if inference speeds up!
DEFAULT_TEXT_QA_PROMPT_TMPL = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information and not prior knowledge, "
"answer the query.\n"
"Query: {query_str}\n"
"Answer: "
)
query_wrapper_prompt = PromptTemplate(
DEFAULT_TEXT_QA_PROMPT_TMPL, prompt_type=PromptType.QUESTION_ANSWER
)

llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=256,
generate_kwargs={"temperature": 0.1, "do_sample": False},
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
model_name="meta-llama/Llama-2-7b-chat-hf",
device_map="auto",
tokenizer_kwargs={"max_length": 512},
# uncomment this if using CUDA to reduce memory usage
model_kwargs={"torch_dtype": torch.float16}
)

service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model
)
set_global_service_context(service_context)

3 comments

LLogan M

The query_wrapper_prompt is only a wrapper around the entire internal prompt that llama-index is using. This is provided so that you have an easy way to format the prompt

For example, for llama2, it might look something like "[INST] {query_str} [/INST] "

AAnurag Agrawal

Thanks @Logan M ! how do I include the {context_str} in this case? Since mine is a RAG Q&A use case, I want to provide them LLM with both context_str and query_str.

LLogan M

LlamaIndex handles that for you. When you do index.as_query_engine(), this creates a query engine that uses it's own rag templates.

At query time, it has context and a query. And that entire template will be wrapped by the query wrapper prompt

Add a reply

Find answers from the community

Hi, I am using llama-2 chat model from