Find answers from the community

Updated 4 months ago

I wonder if my llama_index.as_chat is

At a glance

I wonder if my llama_index.as_chat is getting too much chat history in the prompt causing this or something? Any good ways of managing this? Or settings I can adjust to condense the chat history? Or will I perhaps need to make a custom layer to summarize the chat history

23 comments

LLogan M

The current memory is just a sliding window over the chat history. Been meaning to contribute different memory types but haven't had a chance yet 😅

ssegfault

Oh ok

ssegfault

It doesn’t cut out the system prompt eventually does it?

ssegfault

Cause when my boy chats long enough it gets dementia

ssegfault

Bot*

ssegfault

I’m not making a real boy I swear!

ssegfault

😅

ssegfault

P.s. thanks for the help earlier. My bot is so fricken cool rn

Attachment

LLogan M

How are you setting up the system prompt?

LLogan M

(the answer depends lol)

ssegfault

Ah I see

ssegfault

Well I’m passing it into the system prompt setting of the as_chat function. I also put it in the service context but the chatbot wasn’t using that one for some reason

LLogan M

so you are doing index.as_chat_engine(..., system_prompt=system_prompt) ? And which chat mode?

ssegfault

Yes

ssegfault

Chat mode simple

LLogan M

ok nice, the system prompt will always be there then

ssegfault

Oh good to know. I wonder why it starts getting incoherent then 🤔 maybe my model was only trained to have a certain amount of message history

LLogan M

It could be 🤔 You might have to rely on a shortening the memory buffer, or condesing it somehow

ssegfault

How did you guys go about running a stand alone inference server? Is there an example somewhere

LLogan M

I think llama-cpp has its own server package thing

ssegfault

Oh ok

LLogan M

https://llama-cpp-python.readthedocs.io/en/latest/server/

LLogan M

That thing

Add a reply