I think there's a small misunderstanding here
Setting chunk_size_limit AND all the other things can lead to some errors, the math is a little confusing
Basically, max_input_size should be the max context of the model (usually 4096)
num_outputs and max_tokens should be equal (like you have already)
chunk_size_limit has to be less than max_input_size - num_output - prompt_template_length
Since we can't know the prompt template length ahead of time, it's good to set the chunk_size_limit to a fairly conservative guess (in this case, something below ~3500 should work well!)
If you want, you can just leave chunk_size_limit not defined and internally it gets calculated
Hi @Logan M ! Thanks for clarification! It's still abit confusing though, since I don't know how to control this process...
what parameters you advise me to "don't touch"? (at least until I'll be 100% confident with what is going on)\
does this look legit?
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
prompt_helper = PromptHelper(max_input_size=4096, num_output=512, max_chunk_overlap=20)
llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY, max_tokens=512)
llm_predictor = LLMPredictor(llm=llm)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
decompose_transform = DecomposeQueryTransform(llm_predictor=llm_predictor, verbose=True)
memory = ConversationBufferMemory(memory_key="chat_history")
agent_chain = None
I would say either set the prompt helper/max_tokens settings OR set the chunk_size_limit settings, but not both at the same time
I think that will give a smoother experience (I hope) lol
another thing, I wanted to ask is about ConversationBufferMemory
so this thing will hit the token limit anyways at some point, right?
since, after I get the out_of_token_limit error. I tried to test the questions that've been answered previously.
but got the error anyways
and even could not get an answer for a simple:
"Hi! How are you today?"
the memory is not being released after some point, but keeps filled. Is that correct?
I'm actually not sure how that works with langchains side π€ I would hope that it's smart enough to only fetch a certain amount of memory haha but you might have to experiment a bit. I tried reading their docs but found nothing useful
in short, ConversationBufferMemory is not suitable for long dialogues, right?
I tried to send some questions. And then when a new question sent, I checked what is there inside the object
and it is the whole dialogue
so that whole bunch of text inside the memory, is sent to LLM along with the new prompt, right?
Well in langchain there are multiple calls to the LLM
Deciding to use a tool, finding the answer, generating a final response, all separate calls
hm, interesting. Let me go through their memory docs then
Yeah, this is because of the memory
I conducted an experiment:
asked 1000 questions in a loop
65 question ----> out of token limit
Initial thought --- 65's question fit a big chunk to LLM, so it could not handle
Turned the app off (memory cleaned)
Turned the app on (fresh memory)
Asked question 65 ----- LLM returned an answer
asked 1000 questions in a loop starting from q65
89 question --- out of token
asked "Hi" ---- out of token
check memory ---- stores everything starting from the very first question
did `memory.clear()`
re-asked --- returned an answer
So I think that I need to wrap it with try except and If it is hitting token limit, then clear the memory and try again
Yeah, I was thinking that buffer window is suitable for my case. Thank you @Logan M π
Btw, do you think it worth adding the memory issue to the documentation? Since it took me few hours to figure it out. Maybe it won't be obvious for some folks as well?
Yea it might be worthwhile to add I suppose π€