I think default value for temperature is set to be as 0 and for context_windows it is 4096, In case if it is for Huggingface.
You can change the parameters as per your needs just by passing the values during llm object initialisation time.
In case of running Llama locally via HF, You can do the following:
llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=2048,
generate_kwargs={"temperature": 0.0, "do_sample": False},
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name=selected_model,
model_name=selected_model,
device_map="auto",
# change these settings below depending on your GPU
model_kwargs={"torch_dtype": torch.float16, "load_in_8bit": True},
)
Find more here:
https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/SimpleIndexDemoLlama-Local.htmlYou can run Llama from other platforms as well with default parameters and with your selected parameters as well.
https://gpt-index.readthedocs.io/en/latest/examples/llm/llama_2.html#configure-model