Find answers from the community

Updated 3 months ago

Hi folks - first time poster, did a

Hi folks - first time poster, did a solid search prior to posting and have been unable to find a lead into a solution. I'm getting a proper response with Llama2 but empty response with Vicuna and Claude, using identical LlamaCPP parameters, more details in thread:
L
L
7 comments
This is my LlamaCPP object:

self.llm = LlamaCPP(
model_url=self.model_url,
model_path=None,
temperature=0.1,
max_new_tokens=256,
context_window=3900,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 1},
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=False,
)
I'm guessing it has to do with model_kwargs, context window or stuff like that, but honestly I have no idea where to start flipping stuff
hmm, I think

  1. you probably need to change the messagese_to_prompt and completion_to_prompt functions so that they format things properly for these models (the ones built-in for llama-index are only for llama2 format -- not sure about those other two models)
  1. Maybe change the global tokenizer to match the new models, usually blank responses are because the inputs got too big (and the tokenizer is used to count tokens)
i.e. for llama2 I might do something like
Plain Text
from llama_index import set_global_tokenizer
from transformers import AutoTokenizer

set_global_tokenizer(
    AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode
)
awesome, this gives me direction. I'll get spelunking and keep you posted πŸ˜‰
Add a reply
Sign up and join the conversation on Discord