LlamaIndex

Log inLog into community

Find answers from the community

Updated 3 months ago

Troubleshooting bot failure after 30 questions in a single session

Troubleshooting bot failure after 30 questions in a single session

At a glance

The community members are experiencing issues with a bot that fails after 30 questions in a single session, returning no response. The error messages indicate that the available context size is negative, suggesting that the context window is too small, the retrieved text is too much, or the chat history is too long.

The community members have tried various approaches to address the issue, such as increasing the token limit on the memory, using different language models (GPT-3.5 and GPT-4), and adjusting the top-k and chunk size. However, they are still facing challenges with increasing response time and maintaining a stable chat history.

The community members have also discussed the possibility of caching answers to avoid repeatedly calling the language models, and the need to monitor the current available context size, memory used, and maximum tokens used to prevent the bot from reaching the limit and breaking.

PPragyan Mohapatra

·

Bot is failing after 30questions in a single session. Getting none as response. I am unable to debug what may be the reason

W

P

L

37 comments

I dont think that should be the case, could you test it again and share the code and response as well

PPragyan Mohapatra

The error is calculated available context size -17 was not non-negative

PPragyan Mohapatra

Available context size -80 was not non negative

PPragyan Mohapatra

@WhiteFang_Jr @Logan M

seems like either

your context window is too small
you retrieved too much text
your chat history is too long

PPragyan Mohapatra

How do I increase the limit

PPragyan Mohapatra

Chat history is for 30 questions

PPragyan Mohapatra

How do I keep last 3 chat history for individual users…I believe chat_engine.reset() resets all the history

By default, the token limit on the memory is based on the LLM you are using. It defaults to around 75% of the llm context window

memory = ChatMemoryBuffer.from_defaults(token_limit=5000) would be the manual way to configure it

I would take a second look as well at:

what llm are you using? How big is the context window?
how much text are you retrieving?

PPragyan Mohapatra

I am using gpt35 llm and after 40 questions, it stops

PPragyan Mohapatra

Each question may have 6-7 lines answer

PPragyan Mohapatra

And I am using index.as_chat_engine as my query engine with chat mode condense plus context

PPragyan Mohapatra

@Logan M

Did you set any top k? Whats your chunk size?

Anyways, just pass in the memory (or try using 4o-mini, the context window is way larger, 3.5-turbo is only 16k)

Plain Text

from llama_index.core.memory import ChatMemoryBuffer
memory = ChatMemoryBuffer.from_defaults(token_limit=8000)

chat_engine = index.as_chat_engine(..., memory=memory)

PPragyan Mohapatra

I have not set any top k or any chunk size, I have loaded the pdfs using SimpleDirectoryReader,
Create index using vectorstoreindex.from_documents and then
Index._as_chat_engine

PPragyan Mohapatra

Does setting top k read better

PPragyan Mohapatra

Which is best for reading documents

PPragyan Mohapatra

@Logan M

thats fine for defaults. Default top-k is 2, default chunk size is 1024

PPragyan Mohapatra

I see respond time increasing after I use 4o-mini

PPragyan Mohapatra

and increase memory

PPragyan Mohapatra

Can you please suggest how do I reduce respond time

reponse time is a function of how much input to the llm and how much out

The input is limitied by the top k, length of the user query, and the limit on the memory

You might want to set a lower token limit on the memory

PPragyan Mohapatra

Can the chat engine automatically erase previous chat histories and just keep recent ones

PPragyan Mohapatra

If we decrease the token limit

PPragyan Mohapatra

@Logan M

Yea that's what the token limit does here

Takes the last X messages that fit in the limit

PPragyan Mohapatra

Got it, thanks a lot @Logan M
I had another question, do we have typing loader that types while chatting

PPragyan Mohapatra

@Logan M after changing the llm to 4o-mini and decreasing the token limit to 1000, the respond time has increased drastically

PPragyan Mohapatra

How do I decrease the respond time

PPragyan Mohapatra

@Logan M Can you please help here

Seems like gpt-4o-mini is just slow?

PPragyan Mohapatra

Any other Llm you would like to suggest with large context window but good document retrieval response

PPragyan Mohapatra

Or can some answers be cached so that we need not call llms everytime

PPragyan Mohapatra

@Logan M can I get to know current available current available context size and memory used and max tokens used so that when it approaches the limit, I can reset the variables and the chat engine so that it doesn’t reach the limit and break with error

Add a reply

Sign up and join the conversation on Discord