Find answers from the community

Updated last year

Is someone else having issues with

Is someone else having issues with LlamaIndex when using ElasticsearchStore with the ChatEngine (chat mode "context") and SentenceWindowNodeParser where the LLM response is sometimes incomplete and responding with cut of chunks? The mentioned issue emerged when I upgraded LlamaIndex from 0.8.47 to 0.9.4. The pipeline for my chat bot stayed the same throughout the two versions but now the responses are completely inpredictable. In my logs, I can see that it correctly retrieves the relevant nodes from my ElasticSearch index and LLM completion input seems fine. Since I'm using Azure OpenAI gpt-35-turbo, I've also adjusted the embeddings to use AzureOpenAIEmbedding instead of OpenAIEmbedding that was mentioned in the change logs. Does someone have any ideas?
L
2 comments
I think I know the issue

Try creating the context chat engine with a memory that has a high token limit

Plain Text
from llama_index.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

chat_engine = index.as_chat_engine(chat_mode="context", memory=memory)
I need to update the default token limit for context chat engine I think
Add a reply
Sign up and join the conversation on Discord