The community member is using llama-2 locally for a RAG pipeline using llama-cpp-python and wants to change the default system_prompt, but setting it in the LlamaCPP() constructor did not work. Other community members suggest modifying the messages_to_prompt and completion_to_prompt functions, or copying and modifying the original code from the llama_index repository. However, the original poster does not want to add more code or change the llama_utils.py file, and is looking for an easier way to pass the system prompt. One community member mentions that using the ollama library is better than llamacpp because it handles the prompt formatting.
I am using llama-2 locally for a RAG pipeline using llama-cpp-python. I don't want to use the default system_prompt. How do I change it? I tried using the system_prompt argument in LlamaCPP() but it didn't work:
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", device = device, cache_folder=models_dir) n_gpu_layers = 0 if device == "cpu" else -1 llm = LlamaCPP( model_url=None, model_path=f'{models_dir}/llama-2-7b-chat.Q4_K_M.gguf', temperature=0.1, max_new_tokens=256, # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room context_window=3900, generate_kwargs={}, model_kwargs={"n_gpu_layers": n_gpu_layers, "offload_kqv": True}, # transform inputs into Llama2 format messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, verbose=False, system_prompt = "" ) service_context = ServiceContext.from_defaults( llm=llm, embed_model=embed_model ) set_global_service_context(service_context)
Thanks @Logan M ! I knew that but didn't want to add more code or change the llama_utils.py file. I thought there would be an easier way just to pass the system prompt, is there not?