I am using llama-2 locally for a RAG pipeline using llama-cpp-python. I don't want to use the default system_prompt. How do I change it? I tried using the system_prompt argument in LlamaCPP() but it didn't work:
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", device = device, cache_folder=models_dir) n_gpu_layers = 0 if device == "cpu" else -1 llm = LlamaCPP( model_url=None, model_path=f'{models_dir}/llama-2-7b-chat.Q4_K_M.gguf', temperature=0.1, max_new_tokens=256, # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room context_window=3900, generate_kwargs={}, model_kwargs={"n_gpu_layers": n_gpu_layers, "offload_kqv": True}, # transform inputs into Llama2 format messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, verbose=False, system_prompt = "" ) service_context = ServiceContext.from_defaults( llm=llm, embed_model=embed_model ) set_global_service_context(service_context)
Thanks @Logan M ! I knew that but didn't want to add more code or change the llama_utils.py file. I thought there would be an easier way just to pass the system prompt, is there not?