Find answers from the community

Updated 2 years ago

Prompt helper

At a glance

The community members are discussing issues with setting up parameters for a language model, specifically related to the PromptHelper and ConversationBufferMemory classes from the LangChain library. The main points are:

- Setting chunk_size_limit and other parameters can lead to errors, and the recommended approach is to either set the prompt_helper and max_tokens settings or the chunk_size_limit settings, but not both.

- The ConversationBufferMemory class may not be suitable for long dialogues, as it can hit the token limit and the memory may not be released properly, leading to issues with subsequent questions.

The community members suggest trying different memory types, such as BufferWindowMemory or SummaryBufferMemory, and potentially adding the memory issue to the LangChain documentation, as it may not be obvious to some users.

Useful resources
no, not true... I'm getting the error again... even though I set chunk_size_limit to 3000 and max_tokens to 512. ..

What is wrong here? :/

Plain Text
prompt_helper = PromptHelper(max_input_size=512, num_output=512, max_chunk_overlap=20)
llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY, max_tokens=512)
llm_predictor = LLMPredictor(llm=llm)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper, chunk_size_limit=3000)
L
p
26 comments
I think there's a small misunderstanding here

Setting chunk_size_limit AND all the other things can lead to some errors, the math is a little confusing

Basically, max_input_size should be the max context of the model (usually 4096)

num_outputs and max_tokens should be equal (like you have already)

chunk_size_limit has to be less than max_input_size - num_output - prompt_template_length

Since we can't know the prompt template length ahead of time, it's good to set the chunk_size_limit to a fairly conservative guess (in this case, something below ~3500 should work well!)
If you want, you can just leave chunk_size_limit not defined and internally it gets calculated
Hi @Logan M ! Thanks for clarification! It's still abit confusing though, since I don't know how to control this process...
what parameters you advise me to "don't touch"? (at least until I'll be 100% confident with what is going on)\
does this look legit?

Plain Text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
prompt_helper = PromptHelper(max_input_size=4096, num_output=512, max_chunk_overlap=20)
llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY, max_tokens=512)
llm_predictor = LLMPredictor(llm=llm)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
decompose_transform = DecomposeQueryTransform(llm_predictor=llm_predictor, verbose=True)
memory = ConversationBufferMemory(memory_key="chat_history")
agent_chain = None
I would say either set the prompt helper/max_tokens settings OR set the chunk_size_limit settings, but not both at the same time

I think that will give a smoother experience (I hope) lol
another thing, I wanted to ask is about ConversationBufferMemory
so this thing will hit the token limit anyways at some point, right?
since, after I get the out_of_token_limit error. I tried to test the questions that've been answered previously.
but got the error anyways
and even could not get an answer for a simple:

"Hi! How are you today?"
the memory is not being released after some point, but keeps filled. Is that correct?
I'm actually not sure how that works with langchains side πŸ€” I would hope that it's smart enough to only fetch a certain amount of memory haha but you might have to experiment a bit. I tried reading their docs but found nothing useful
in short, ConversationBufferMemory is not suitable for long dialogues, right?
I tried to send some questions. And then when a new question sent, I checked what is there inside the object
and it is the whole dialogue
I can't say for sure. It looks like they have many memory types though https://python.langchain.com/en/latest/modules/memory/how_to_guides.html
so that whole bunch of text inside the memory, is sent to LLM along with the new prompt, right?
Well in langchain there are multiple calls to the LLM

Deciding to use a tool, finding the answer, generating a final response, all separate calls
hm, interesting. Let me go through their memory docs then
Yeah, this is because of the memory
I conducted an experiment:


Plain Text
asked 1000 questions in a loop 

65 question ----> out of token limit

Initial thought --- 65's question fit a big chunk to LLM, so it could not handle

Turned the app off (memory cleaned)

Turned the app on (fresh memory)

Asked question 65 ----- LLM returned an answer

asked 1000 questions in a loop starting from q65

89 question --- out of token

asked "Hi" ---- out of token

check memory ---- stores everything starting from the very first question

did `memory.clear()` 

re-asked --- returned an answer


So I think that I need to wrap it with try except and If it is hitting token limit, then clear the memory and try again
what do you think?
Yeah, I was thinking that buffer window is suitable for my case. Thank you @Logan M πŸ‘

Btw, do you think it worth adding the memory issue to the documentation? Since it took me few hours to figure it out. Maybe it won't be obvious for some folks as well?
Yea it might be worthwhile to add I suppose πŸ€”
Add a reply
Sign up and join the conversation on Discord