Find answers from the community

s
F
Y
a
P
Updated last month

issues with chat returning 2 of the same message

has anyone had an issue with the chat engine always returning 2 of the users query? I am only passing an empty chat history and message, but the .chat() function gets the following:
Plain Text
[ChatMessage(role=<MessageRole.USER: 'user'>, content='test\n', additional_kwargs={}), ChatMessage(role=<MessageRole.USER: 'user'>, content='test\n', additional_kwargs={})]
m
L
62 comments
Plain Text
llama_index/chat_engine/simple.py
@trace_method("chat")
    def chat(
        self, message: str, chat_history: Optional[List[ChatMessage]] = None
    ) -> AgentChatResponse:
        if chat_history is not None:
            self._memory.set(chat_history)
        self._memory.put(ChatMessage(content=message, role="user"))
        initial_token_count = len(
            self._memory.tokenizer_fn(
                " ".join([(m.content or "") for m in self._prefix_messages])
            )
        )
        all_messages = self._prefix_messages + self._memory.get(
            initial_token_count=initial_token_count
        )
        print("==================")
        print(all_messages)
        print("==================")
        chat_response = self._llm.chat(all_messages)
        ai_message = chat_response.message
        self._memory.put(ai_message)

        return AgentChatResponse(response=str(chat_response.message.content))

All message is the initial chat message 2 times for some reason
uhhh I have not seen that πŸ€”

Are you modifying the memory outside of the chat engine? Passing in your own chat history?
Yes I am passing in my chat histoy, when logging chat history it is []
Plain Text
...
  chat_engine = get_simple_chat_engine(history, model)
          response = run_with_fallback(chat_engine.stream_chat, chat_engine.chat, message)
...
def get_simple_chat_engine(history: List[ChatMessage], model: ChatLLM):
    service_context = get_service_context(model=model)
    chat_engine = SimpleChatEngine.from_defaults(
        service_context=service_context, chat_history=history
    )
    return chat_engine

run with fallback simply tries to steam, if not implemented it calls the other fn
debugging a it more, interestingly when chat is called, chat_history is None, when i passed in []
Plain Text
        print(chat_history, message)
        if chat_history is not None:
            self._memory.set(chat_history)
        self._memory.put(ChatMessage(content=message, role="user"))
        initial_token_count = len(
            self._memory.tokenizer_fn(
                " ".join([(m.content or "") for m in self._prefix_messages])
            )
        )
        all_messages = self._prefix_messages + self._memory.get(
            initial_token_count=initial_token_count
        )
i think the troubling line is
self._memory.put(ChatMessage(content=message, role="user"))
we are adding the message to the history, when its the current message
but don't we want to add the current message to the history?
I think it only does that once
yes but then the actual chat is called with
history = [current_msg}
and
message=current_msg

So when this is translated into a prompt you get something like:
Plain Text
User: Hello
User: Hello
Assistant:
πŸ€” This is simple chat engine right?
Yes, thats the one im working with right now, not sure how it works with "RAG" or "Agent" chat modes

As I am getting errors: about the messages not being sequential user/assistant format when trying to apply a models chat prompt
ill see if it breaks similarly with other chat engines shortly
So line 77 sets the chat history (if you passed in [] I would hope this line gets hit lol

Line 84 gets all messages (in the first chat message, it would be just [user_message])

Then line 88 sends that to the llm πŸ€”
Attachment
image.png
Just trying to understand how the user message ends up there twice lol
yes when i pass chat history into the engine it is []
there is something happening, converting that [] into None and im not sure where thats happening
if i pass in a chat history with messages already in it, it works just fine, seemingly
ok lemme try lol
break points set!
i think i see part of the issue, might be my misuse
Plain Text
>>> from llama_index.chat_engine import SimpleChatEngine
>>> engine = SimpleChatEngine.from_defaults()
>>> engine.chat("Hello!", chat_history=[])
> /Users/loganmarkewich/llama_index_next/llama_index/chat_engine/simple.py(79)chat()
-> if chat_history is not None:
(Pdb) chat_history
[]


Yea, seems to work on my end
I am passing in chat history here:
Plain Text
engine = SimpleChatEngine.from_defaults(history=[])
that is how ive been doing it pretty much across the board
Plain Text
   
 chat_engine = SimpleChatEngine.from_defaults(
        service_context=service_context, chat_history=history
    )
it works for the React and openai agent as well as CondensePlusContextChatEngine
it should work here as well
but i guess with the simple chat engine i have to pass it in into the chat ?
Is history supposed to be empty? Or it has the initial user message?
[] empty just this, when im passing it
works for me then
Plain Text
>>> from llama_index.chat_engine import SimpleChatEngine
>>> engine = SimpleChatEngine.from_defaults(chat_history=[])
>>> engine.chat("Hello!")
> /Users/loganmarkewich/llama_index_next/llama_index/chat_engine/simple.py(90)chat()
-> chat_response = self._llm.chat(all_messages)
(Pdb) len(all_messages)
1
yea it works still πŸ‘€
okay i was only passing history into the engine creation
let me try adding it to both
Plain Text
chat_engine = get_simple_chat_engine(history, model)
print("history", history) # []
response = run_with_fallback(
  chat_engine.stream_chat, chat_engine.chat, message, chat_history=history

@trace_method("chat")
def chat(
    self, message: str, chat_history: Optional[List[ChatMessage]] = None
) -> AgentChatResponse:
    print("======== start of simple chat =======")
    print(chat_history, message) # [ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={})] hello
    if chat_history is not None:
        self._memory.set(chat_history)
    self._memory.put(ChatMessage(content=message, role="user"))
    initial_token_count = len(
        self._memory.tokenizer_fn(
            " ".join([(m.content or "") for m in self._prefix_messages])
        )
    )
    all_messages = self._prefix_messages + self._memory.get(
        initial_token_count=initial_token_count
    )
    print("==================")
    print(all_messages)
    print("==================")
    chat_response = self._llm.chat(all_messages)
    ai_message = chat_response.message
    self._memory.put(ai_message)
    print("======== end of simple chat =======")
    return AgentChatResponse(response=str(chat_response.message.content))
print(chat_history, message) # [ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={})] hello
i just dont understand WHY chat_history is not what im passing in at this point
i must be losing my mind
its still getting 2 items in the array also
lol wait, so what did print(all_messages) end up returning? 2 items?
Plain Text
==================
[ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={}), ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={})]
==================
i need to get a debug config setup with python as just spamming logs is QUITE annoying.
Is it possible to see if my example above works for you?
yes give me just a moment, ill see if i can create a minimum replication that easier to see
that is the way πŸ™
working on this, but is there a way to select what llm is used for train of thought when using ReAct agent, seperate from what does the final generation
Plain Text
response = run_with_fallback(
            primary_fn=chat_engine.stream_chat,
            fallback_fn=chat_engine.chat,
            message=message,
        )
when im trying to stream chat, the chat engine adds to history, then it fails because its not implemented, so when it goes into chat theres already stuff in the memory
lol uhhh that sounds confusing, not sure I fully follow whats happening there
so because some models don't support streaming, i made something that tries to stream, if it throws not implemented, it runs the other.

BUT
engine.stream_chat adds to memory
so when engine.stream_chat calls llm.stream_chat it throws the not implemented
then when we call chat_engine.chat the internal memory is already maniped with the added chat

I just changed my approach to not use that and it works out, just didnt expect it to manip internal memory if it didn't succeed
Add a reply
Sign up and join the conversation on Discord