Find answers from the community

Updated 10 months ago

I am using llama-2 locally for a RAG

I am using llama-2 locally for a RAG pipeline using llama-cpp-python. I don't want to use the default system_prompt. How do I change it? I tried using the system_prompt argument in LlamaCPP() but it didn't work:

device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", device = device, cache_folder=models_dir)
n_gpu_layers = 0 if device == "cpu" else -1
llm = LlamaCPP(
model_url=None,
model_path=f'{models_dir}/llama-2-7b-chat.Q4_K_M.gguf',
temperature=0.1,
max_new_tokens=256,
# llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
context_window=3900,
generate_kwargs={},
model_kwargs={"n_gpu_layers": n_gpu_layers, "offload_kqv": True},
# transform inputs into Llama2 format
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=False,
system_prompt = ""
)
service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model
)
set_global_service_context(service_context)
L
A
7 comments
modify these functions

Plain Text
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
(keep in mind this is specific to llama2-chat)
Thanks @Logan M ! I knew that but didn't want to add more code or change the llama_utils.py file. I thought there would be an easier way just to pass the system prompt, is there not?
Since those detault utils are inserting the default official llama2 prompt, you'll need to modify them
(imo llamacpp is so hard to use. I've found using ollama is 10000x better because it handles all the prompt formatting)
Add a reply
Sign up and join the conversation on Discord