LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

issues with chat returning 2 of the same message

issues with chat returning 2 of the same message

At a glance

The community member is experiencing an issue with the SimpleChatEngine where it is returning two copies of the user's query, even when they are only passing an empty chat history and message. The community members discuss the issue, examining the code and trying different approaches, but are unable to find a clear solution. They note that the issue seems to be related to how the chat history is being handled, with the message being added to the history even though it is the current message. The community members also try passing in a non-empty chat history, which seems to work correctly. However, they are still unable to determine the root cause of the issue with the empty chat history.

mmaybe goats dont exist

·

has anyone had an issue with the chat engine always returning 2 of the users query? I am only passing an empty chat history and message, but the .chat() function gets the following:

Plain Text

[ChatMessage(role=<MessageRole.USER: 'user'>, content='test\n', additional_kwargs={}), ChatMessage(role=<MessageRole.USER: 'user'>, content='test\n', additional_kwargs={})]

m

L

62 comments

mmaybe goats dont exist

Plain Text

llama_index/chat_engine/simple.py
@trace_method("chat")
    def chat(
        self, message: str, chat_history: Optional[List[ChatMessage]] = None
    ) -> AgentChatResponse:
        if chat_history is not None:
            self._memory.set(chat_history)
        self._memory.put(ChatMessage(content=message, role="user"))
        initial_token_count = len(
            self._memory.tokenizer_fn(
                " ".join([(m.content or "") for m in self._prefix_messages])
            )
        )
        all_messages = self._prefix_messages + self._memory.get(
            initial_token_count=initial_token_count
        )
        print("==================")
        print(all_messages)
        print("==================")
        chat_response = self._llm.chat(all_messages)
        ai_message = chat_response.message
        self._memory.put(ai_message)

        return AgentChatResponse(response=str(chat_response.message.content))

All message is the initial chat message 2 times for some reason

uhhh I have not seen that 🤔

Are you modifying the memory outside of the chat engine? Passing in your own chat history?

mmaybe goats dont exist

Yes I am passing in my chat histoy, when logging chat history it is []

mmaybe goats dont exist

Plain Text

...
  chat_engine = get_simple_chat_engine(history, model)
          response = run_with_fallback(chat_engine.stream_chat, chat_engine.chat, message)
...
def get_simple_chat_engine(history: List[ChatMessage], model: ChatLLM):
    service_context = get_service_context(model=model)
    chat_engine = SimpleChatEngine.from_defaults(
        service_context=service_context, chat_history=history
    )
    return chat_engine

run with fallback simply tries to steam, if not implemented it calls the other fn

mmaybe goats dont exist

debugging a it more, interestingly when chat is called, chat_history is None, when i passed in []

mmaybe goats dont exist

Plain Text

        print(chat_history, message)
        if chat_history is not None:
            self._memory.set(chat_history)
        self._memory.put(ChatMessage(content=message, role="user"))
        initial_token_count = len(
            self._memory.tokenizer_fn(
                " ".join([(m.content or "") for m in self._prefix_messages])
            )
        )
        all_messages = self._prefix_messages + self._memory.get(
            initial_token_count=initial_token_count
        )

mmaybe goats dont exist

i think the troubling line is

mmaybe goats dont exist

self._memory.put(ChatMessage(content=message, role="user"))

mmaybe goats dont exist

we are adding the message to the history, when its the current message

but don't we want to add the current message to the history?

I think it only does that once

right?

mmaybe goats dont exist

yes but then the actual chat is called with
history = [current_msg}
and
message=current_msg

So when this is translated into a prompt you get something like:

Plain Text

User: Hello
User: Hello
Assistant:

🤔 This is simple chat engine right?

mmaybe goats dont exist

Yes, thats the one im working with right now, not sure how it works with "RAG" or "Agent" chat modes

As I am getting errors: about the messages not being sequential user/assistant format when trying to apply a models chat prompt

mmaybe goats dont exist

ill see if it breaks similarly with other chat engines shortly

So line 77 sets the chat history (if you passed in [] I would hope this line gets hit lol

Line 84 gets all messages (in the first chat message, it would be just [user_message])

Then line 88 sends that to the llm 🤔

Attachment

Just trying to understand how the user message ends up there twice lol

mmaybe goats dont exist

yes when i pass chat history into the engine it is []

mmaybe goats dont exist

there is something happening, converting that [] into None and im not sure where thats happening

mmaybe goats dont exist

whuch leads to this

mmaybe goats dont exist

if i pass in a chat history with messages already in it, it works just fine, seemingly

uhhhh wat

ok lemme try lol

break points set!

mmaybe goats dont exist

i think i see part of the issue, might be my misuse

Plain Text

>>> from llama_index.chat_engine import SimpleChatEngine
>>> engine = SimpleChatEngine.from_defaults()
>>> engine.chat("Hello!", chat_history=[])
> /Users/loganmarkewich/llama_index_next/llama_index/chat_engine/simple.py(79)chat()
-> if chat_history is not None:
(Pdb) chat_history
[]

Yea, seems to work on my end

mmaybe goats dont exist

so

mmaybe goats dont exist

I am passing in chat history here:

Plain Text

engine = SimpleChatEngine.from_defaults(history=[])

mmaybe goats dont exist

that is how ive been doing it pretty much across the board

mmaybe goats dont exist

Plain Text

   
 chat_engine = SimpleChatEngine.from_defaults(
        service_context=service_context, chat_history=history
    )

mmaybe goats dont exist

it works for the React and openai agent as well as CondensePlusContextChatEngine

it should work here as well

mmaybe goats dont exist

but i guess with the simple chat engine i have to pass it in into the chat ?

let me check

Is history supposed to be empty? Or it has the initial user message?

mmaybe goats dont exist

[] empty just this, when im passing it

works for me then

Plain Text

>>> from llama_index.chat_engine import SimpleChatEngine
>>> engine = SimpleChatEngine.from_defaults(chat_history=[])
>>> engine.chat("Hello!")
> /Users/loganmarkewich/llama_index_next/llama_index/chat_engine/simple.py(90)chat()
-> chat_response = self._llm.chat(all_messages)
(Pdb) len(all_messages)
1

oh wait

yea it works still 👀

mmaybe goats dont exist

okay i was only passing history into the engine creation

mmaybe goats dont exist

let me try adding it to both

mmaybe goats dont exist

Plain Text

chat_engine = get_simple_chat_engine(history, model)
print("history", history) # []
response = run_with_fallback(
  chat_engine.stream_chat, chat_engine.chat, message, chat_history=history

@trace_method("chat")
def chat(
    self, message: str, chat_history: Optional[List[ChatMessage]] = None
) -> AgentChatResponse:
    print("======== start of simple chat =======")
    print(chat_history, message) # [ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={})] hello
    if chat_history is not None:
        self._memory.set(chat_history)
    self._memory.put(ChatMessage(content=message, role="user"))
    initial_token_count = len(
        self._memory.tokenizer_fn(
            " ".join([(m.content or "") for m in self._prefix_messages])
        )
    )
    all_messages = self._prefix_messages + self._memory.get(
        initial_token_count=initial_token_count
    )
    print("==================")
    print(all_messages)
    print("==================")
    chat_response = self._llm.chat(all_messages)
    ai_message = chat_response.message
    self._memory.put(ai_message)
    print("======== end of simple chat =======")
    return AgentChatResponse(response=str(chat_response.message.content))

mmaybe goats dont exist

    print(chat_history, message) # [ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={})] hello

i just dont understand WHY chat_history is not what im passing in at this point

mmaybe goats dont exist

i must be losing my mind

mmaybe goats dont exist

its still getting 2 items in the array also

lol wait, so what did print(all_messages) end up returning? 2 items?

mmaybe goats dont exist

yes

mmaybe goats dont exist

Plain Text

==================
[ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={}), ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={})]
==================

wtfff

mmaybe goats dont exist

i need to get a debug config setup with python as just spamming logs is QUITE annoying.

Is it possible to see if my example above works for you?

mmaybe goats dont exist

yes give me just a moment, ill see if i can create a minimum replication that easier to see

that is the way 🙏

mmaybe goats dont exist

working on this, but is there a way to select what llm is used for train of thought when using ReAct agent, seperate from what does the final generation

mmaybe goats dont exist

@Logan M ... okay

mmaybe goats dont exist

Plain Text

response = run_with_fallback(
            primary_fn=chat_engine.stream_chat,
            fallback_fn=chat_engine.chat,
            message=message,
        )

mmaybe goats dont exist

when im trying to stream chat, the chat engine adds to history, then it fails because its not implemented, so when it goes into chat theres already stuff in the memory

mmaybe goats dont exist

not a bug?

mmaybe goats dont exist

lol

lol uhhh that sounds confusing, not sure I fully follow whats happening there

mmaybe goats dont exist

so because some models don't support streaming, i made something that tries to stream, if it throws not implemented, it runs the other.

BUT
engine.stream_chat adds to memory
so when engine.stream_chat calls llm.stream_chat it throws the not implemented
then when we call chat_engine.chat the internal memory is already maniped with the added chat

I just changed my approach to not use that and it works out, just didnt expect it to manip internal memory if it didn't succeed

Add a reply

Sign up and join the conversation on Discord