A community member is experiencing issues with LlamaIndex when using ElasticsearchStore with the ChatEngine (chat mode "context") and SentenceWindowNodeParser, where the LLM response is sometimes incomplete and responding with cut-off chunks. The issue emerged after upgrading LlamaIndex from version 0.8.47 to 0.9.4, even though the pipeline for the chatbot remained the same. The community member has checked the logs and found that the relevant nodes are correctly retrieved from the Elasticsearch index, and the LLM completion input seems fine. They are also using Azure OpenAI gpt-35-turbo and have adjusted the embeddings to use AzureOpenAIEmbedding instead of OpenAIEmbedding.
In the comments, a community member suggests that the issue might be related to the default token limit for the context chat engine, and recommends increasing the token limit by creating the context chat engine with a higher token limit using the ChatMemoryBuffer class.
Is someone else having issues with LlamaIndex when using ElasticsearchStore with the ChatEngine (chat mode "context") and SentenceWindowNodeParser where the LLM response is sometimes incomplete and responding with cut of chunks? The mentioned issue emerged when I upgraded LlamaIndex from 0.8.47 to 0.9.4. The pipeline for my chat bot stayed the same throughout the two versions but now the responses are completely inpredictable. In my logs, I can see that it correctly retrieves the relevant nodes from my ElasticSearch index and LLM completion input seems fine. Since I'm using Azure OpenAI gpt-35-turbo, I've also adjusted the embeddings to use AzureOpenAIEmbedding instead of OpenAIEmbedding that was mentioned in the change logs. Does someone have any ideas?