llama_index/chat_engine/simple.py
@trace_method("chat")
def chat(
self, message: str, chat_history: Optional[List[ChatMessage]] = None
) -> AgentChatResponse:
if chat_history is not None:
self._memory.set(chat_history)
self._memory.put(ChatMessage(content=message, role="user"))
initial_token_count = len(
self._memory.tokenizer_fn(
" ".join([(m.content or "") for m in self._prefix_messages])
)
)
all_messages = self._prefix_messages + self._memory.get(
initial_token_count=initial_token_count
)
print("==================")
print(all_messages)
print("==================")
chat_response = self._llm.chat(all_messages)
ai_message = chat_response.message
self._memory.put(ai_message)
return AgentChatResponse(response=str(chat_response.message.content))
All message is the initial chat message 2 times for some reason
uhhh I have not seen that π€
Are you modifying the memory outside of the chat engine? Passing in your own chat history?
Yes I am passing in my chat histoy, when logging chat history it is []
...
chat_engine = get_simple_chat_engine(history, model)
response = run_with_fallback(chat_engine.stream_chat, chat_engine.chat, message)
...
def get_simple_chat_engine(history: List[ChatMessage], model: ChatLLM):
service_context = get_service_context(model=model)
chat_engine = SimpleChatEngine.from_defaults(
service_context=service_context, chat_history=history
)
return chat_engine
run with fallback simply tries to steam, if not implemented it calls the other fn
debugging a it more, interestingly when chat is called, chat_history is None, when i passed in []
print(chat_history, message)
if chat_history is not None:
self._memory.set(chat_history)
self._memory.put(ChatMessage(content=message, role="user"))
initial_token_count = len(
self._memory.tokenizer_fn(
" ".join([(m.content or "") for m in self._prefix_messages])
)
)
all_messages = self._prefix_messages + self._memory.get(
initial_token_count=initial_token_count
)
i think the troubling line is
self._memory.put(ChatMessage(content=message, role="user"))
we are adding the message to the history, when its the current message
but don't we want to add the current message to the history?
I think it only does that once
yes but then the actual chat is called with
history = [current_msg}
and
message=current_msg
So when this is translated into a prompt you get something like:
User: Hello
User: Hello
Assistant:
π€ This is simple chat engine right?
Yes, thats the one im working with right now, not sure how it works with "RAG" or "Agent" chat modes
As I am getting errors: about the messages not being sequential user/assistant format when trying to apply a models chat prompt
ill see if it breaks similarly with other chat engines shortly
So line 77 sets the chat history (if you passed in []
I would hope this line gets hit lol
Line 84 gets all messages (in the first chat message, it would be just [user_message]
)
Then line 88 sends that to the llm π€
Just trying to understand how the user message ends up there twice lol
yes when i pass chat history into the engine it is []
there is something happening, converting that [] into None and im not sure where thats happening
if i pass in a chat history with messages already in it, it works just fine, seemingly
i think i see part of the issue, might be my misuse
>>> from llama_index.chat_engine import SimpleChatEngine
>>> engine = SimpleChatEngine.from_defaults()
>>> engine.chat("Hello!", chat_history=[])
> /Users/loganmarkewich/llama_index_next/llama_index/chat_engine/simple.py(79)chat()
-> if chat_history is not None:
(Pdb) chat_history
[]
Yea, seems to work on my end
I am passing in chat history here:
engine = SimpleChatEngine.from_defaults(history=[])
that is how ive been doing it pretty much across the board
chat_engine = SimpleChatEngine.from_defaults(
service_context=service_context, chat_history=history
)
it works for the React and openai agent as well as CondensePlusContextChatEngine
it should work here as well
but i guess with the simple chat engine i have to pass it in into the chat ?
Is history supposed to be empty? Or it has the initial user message?
[] empty just this, when im passing it
works for me then
>>> from llama_index.chat_engine import SimpleChatEngine
>>> engine = SimpleChatEngine.from_defaults(chat_history=[])
>>> engine.chat("Hello!")
> /Users/loganmarkewich/llama_index_next/llama_index/chat_engine/simple.py(90)chat()
-> chat_response = self._llm.chat(all_messages)
(Pdb) len(all_messages)
1
okay i was only passing history into the engine creation
let me try adding it to both
chat_engine = get_simple_chat_engine(history, model)
print("history", history) # []
response = run_with_fallback(
chat_engine.stream_chat, chat_engine.chat, message, chat_history=history
@trace_method("chat")
def chat(
self, message: str, chat_history: Optional[List[ChatMessage]] = None
) -> AgentChatResponse:
print("======== start of simple chat =======")
print(chat_history, message) # [ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={})] hello
if chat_history is not None:
self._memory.set(chat_history)
self._memory.put(ChatMessage(content=message, role="user"))
initial_token_count = len(
self._memory.tokenizer_fn(
" ".join([(m.content or "") for m in self._prefix_messages])
)
)
all_messages = self._prefix_messages + self._memory.get(
initial_token_count=initial_token_count
)
print("==================")
print(all_messages)
print("==================")
chat_response = self._llm.chat(all_messages)
ai_message = chat_response.message
self._memory.put(ai_message)
print("======== end of simple chat =======")
return AgentChatResponse(response=str(chat_response.message.content))
print(chat_history, message) # [ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={})] hello
i just dont understand WHY chat_history is not what im passing in at this point
its still getting 2 items in the array also
lol wait, so what did print(all_messages)
end up returning? 2 items?
==================
[ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={}), ChatMessage(role=<MessageRole.USER: 'user'>, content='hello\n', additional_kwargs={})]
==================
i need to get a debug config setup with python as just spamming logs is QUITE annoying.
Is it possible to see if my example above works for you?
yes give me just a moment, ill see if i can create a minimum replication that easier to see
working on this, but is there a way to select what llm is used for train of thought when using ReAct agent, seperate from what does the final generation
response = run_with_fallback(
primary_fn=chat_engine.stream_chat,
fallback_fn=chat_engine.chat,
message=message,
)
when im trying to stream chat, the chat engine adds to history, then it fails because its not implemented, so when it goes into chat theres already stuff in the memory
lol uhhh that sounds confusing, not sure I fully follow whats happening there
so because some models don't support streaming, i made something that tries to stream, if it throws not implemented, it runs the other.
BUT
engine.stream_chat adds to memory
so when engine.stream_chat calls llm.stream_chat it throws the not implemented
then when we call chat_engine.chat the internal memory is already maniped with the added chat
I just changed my approach to not use that and it works out, just didnt expect it to manip internal memory if it didn't succeed