The community member is building an app using Llama-Index and Ollama models and is trying to figure out how to limit the number of tokens the model outputs in response to a query. They have tried using max_tokens, new_max_tokens, num_output, and token_limit, but none of these have worked. The community members are seeking guidance or solutions from the community on how to properly limit the model's response.
The comments suggest trying num_predict, but the community member says it didn't work. They also note that num_output is in the LLMMetadata function, but it's not in the init and not a Field, so they guess it's not implemented yet. Another community member suggests that it's likely a keyword argument that "just sails through".
The community members discuss the possibility of using additional_kwargs={"num_predict": 256} for Ollama, but note that this would just cut off the response if it's longer than the limit, rather than catering the response to the limit. The final comment suggests that "prompt engineering" might be a way
Hey guys so im building an app using Llama-Index and Ollama models. Im trying to figure out the argument to limit the number of tokens the model outputs to respond to a query. Im currently using max_tokens. I have tried, new_max_tokens, num_output, and token_limit. None of these have been able to limit the models response. Just wanted to see if anyone has figured out the proper argument.
Again this is using Llama-Index using the import: from llama_index.llms.ollama import Ollama Here's my current setup : return Ollama(model=llm_config["model"], request_timeout=30.0, device=llm_config["device"], temperature=temperature, max_tokens=100)
Any solutions/guidance/links to repos or docs/help would be phenomenal. I was told this was a question for Llama-Index by the guys at Ollama. Thanks!
I tried that and it didnt work. I was looking through the base.py file and see that its in the LLMMetadata function as num_output. But its not in the init and its not a Field like the temperature and context window arguments are. Im guessing its just not implemented yet?