Find answers from the community

Updated 2 months ago

I wonder if my llama_index.as_chat is

I wonder if my llama_index.as_chat is getting too much chat history in the prompt causing this or something? Any good ways of managing this? Or settings I can adjust to condense the chat history? Or will I perhaps need to make a custom layer to summarize the chat history
L
s
23 comments
The current memory is just a sliding window over the chat history. Been meaning to contribute different memory types but haven't had a chance yet 😅
It doesn’t cut out the system prompt eventually does it?
Cause when my boy chats long enough it gets dementia
I’m not making a real boy I swear!
P.s. thanks for the help earlier. My bot is so fricken cool rn
Attachment
IMG_5577.png
How are you setting up the system prompt?
(the answer depends lol)
Well I’m passing it into the system prompt setting of the as_chat function. I also put it in the service context but the chatbot wasn’t using that one for some reason
so you are doing index.as_chat_engine(..., system_prompt=system_prompt) ? And which chat mode?
Chat mode simple
ok nice, the system prompt will always be there then
Oh good to know. I wonder why it starts getting incoherent then 🤔 maybe my model was only trained to have a certain amount of message history
It could be 🤔 You might have to rely on a shortening the memory buffer, or condesing it somehow
How did you guys go about running a stand alone inference server? Is there an example somewhere
I think llama-cpp has its own server package thing
Add a reply
Sign up and join the conversation on Discord