The community members are discussing how to switch LLMs (Large Language Models) with the create-llama stack on LlamaIndex and how to fix the max token problem. They suggest modifying the LLM in the service context, which may be in the llamaindex-streaming.ts file. One community member provides some code related to creating a parser and stream transformer. Another community member suggests changing the model name in the constants.ts file or modifying a specific line of code. The community members also discuss whether the chat history resets after each run or refresh, and how to automatically reduce the size of the input context to avoid the max token error.
Or do you know a way to automatically reduce the size of the input context so that you dont always get this error of "This model's maximum context length is 8192 tokens. However, your messages resulted in 8209 tokens. Please reduce the length of the messages."
I'm waaay less familiar with the TS library, so I'm not immediately sure how to fix that. Something about limiting the chat memory somewhere I'm guessing