context_window
value, but at the same time I know it's limited by the LLM you're using, so it seems indeed to be a dead end.llm.complete("hi")
I bet it would work and use a set amount of memory.llm.comete("hi " * 50)
the memory usage would increase, but level off if you ran it again. But increase that multiplier, and more memory will be allocated, since you are sending text through newer un-allocated parts of the model.