To make the answers more deterministic or consistent for the same query when using a RAG based llama-index with a local model (llama2) and default embedding, you can adjust the
temperature
parameter of the model.
The
temperature
parameter controls the randomness of the model's predictions. A higher temperature will make the output more diverse and random, while a lower temperature will make the output more deterministic and focused on the most likely output sequence.
Here is how you can set the temperature when creating an instance of the model:
llama2_model = LlamaCPP(temperature=0.1) # A lower temperature makes the output more deterministic
Please note that the exact value for the temperature parameter that will provide the best results can vary depending on the specific use case and may require some experimentation.