Find answers from the community

Updated 4 months ago

whats interesting is that I am able to

whats interesting is that I am able to run the model with ollama command on server directly though
L
k
4 comments
Memory will grow until it reaches the max context limit. Its lazily allocated
Setting a limit on the context window size is the way to limit the memory usage yes
llm = Ollama(..., context_window=3000) for example may help limit memory usage, but the lower you put it, the less context you can fit into the llm, which may increase the number of llm calls needed to run a query
I will try this out, thanks for your help.
Add a reply
Sign up and join the conversation on Discord