Find answers from the community

Updated 3 weeks ago

Troubleshooting bot failure after 30 questions in a single session

Bot is failing after 30questions in a single session. Getting none as response. I am unable to debug what may be the reason
W
P
L
37 comments
I dont think that should be the case, could you test it again and share the code and response as well
The error is calculated available context size -17 was not non-negative
Available context size -80 was not non negative
@WhiteFang_Jr @Logan M
seems like either
  • your context window is too small
  • you retrieved too much text
  • your chat history is too long
How do I increase the limit
Chat history is for 30 questions
How do I keep last 3 chat history for individual users…I believe chat_engine.reset() resets all the history
By default, the token limit on the memory is based on the LLM you are using. It defaults to around 75% of the llm context window

memory = ChatMemoryBuffer.from_defaults(token_limit=5000) would be the manual way to configure it

I would take a second look as well at:
  • what llm are you using? How big is the context window?
  • how much text are you retrieving?
I am using gpt35 llm and after 40 questions, it stops
Each question may have 6-7 lines answer
And I am using index.as_chat_engine as my query engine with chat mode condense plus context
Did you set any top k? Whats your chunk size?

Anyways, just pass in the memory (or try using 4o-mini, the context window is way larger, 3.5-turbo is only 16k)

Plain Text
from llama_index.core.memory import ChatMemoryBuffer
memory = ChatMemoryBuffer.from_defaults(token_limit=8000)

chat_engine = index.as_chat_engine(..., memory=memory)
I have not set any top k or any chunk size, I have loaded the pdfs using SimpleDirectoryReader,
Create index using vectorstoreindex.from_documents and then
Index._as_chat_engine
Does setting top k read better
Which is best for reading documents
thats fine for defaults. Default top-k is 2, default chunk size is 1024
I see respond time increasing after I use 4o-mini
Can you please suggest how do I reduce respond time
reponse time is a function of how much input to the llm and how much out

The input is limitied by the top k, length of the user query, and the limit on the memory
You might want to set a lower token limit on the memory
Can the chat engine automatically erase previous chat histories and just keep recent ones
If we decrease the token limit
Yea that's what the token limit does here
Takes the last X messages that fit in the limit
Got it, thanks a lot @Logan M
I had another question, do we have typing loader that types while chatting
@Logan M after changing the llm to 4o-mini and decreasing the token limit to 1000, the respond time has increased drastically
How do I decrease the respond time
@Logan M Can you please help here
Seems like gpt-4o-mini is just slow?
Any other Llm you would like to suggest with large context window but good document retrieval response
Or can some answers be cached so that we need not call llms everytime
@Logan M can I get to know current available current available context size and memory used and max tokens used so that when it approaches the limit, I can reset the variables and the chat engine so that it doesn’t reach the limit and break with error
Add a reply
Sign up and join the conversation on Discord