llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma2'
llm = LlamaCPP( model_path=model_path, model_kwargs={ "n_gpu_layers": 50, "grammar_path":"/home/_LLM/llama.cpp/grammars/json.gbnf", "repeat_penalty": 0.0}, temperature=0.0, max_new_tokens=max_output_tokens, context_window=context_window, verbose=True )