I'm working with `ReActAgent` and when i

At a glance

The community member is working with the ReActAgent and is experiencing an issue where the .stream_chat method returns the first chain-of-thought step text instead of the final response. Other community members suggest that this is the current behavior and it's difficult to know when the final response is streamed. Some community members provide code examples and suggestions on how to potentially filter the stream or use the static .chat method instead. The community members also discuss using tools with the ReActAgent and the differences in behavior between the streaming and non-streaming versions of the agent.

Useful resources

aagainstmethod

I'm working with ReActAgent and when i do a .chat it works as expected, but if i do .stream_chat and then .print_response_stream it streams the first chain-of-thought step text instead. Did i miss a step somewhere?

15 comments

LLogan M

nah thats how it works at the moment 😢 Its very hard to know when its streaming the final response

LLogan M

My best guess is you can filter the stream

aagainstmethod

No worries. Unfortunately it ends after compiling the first task step, so the content i'd filter for never comes. It's not a huge deal in the short term to use the static response.

LLogan M

Actually, I'm not able to reproduce this behaviour

LLogan M

Plain Text

>>> from llama_index.core import VectorStoreIndex, Document
>>> index = VectorStoreIndex.from_documents([Document.example()])
>>> chat_engine = index.as_chat_engine(chat_mode="react")
>>> response = chat_engine.stream_chat("Tell me a fact about LLMs?")
>>> for token in response.response_gen:
...   print(token, end="", flush=True)
... 
LLMs are pre-trained on large amounts of publicly available data, making them a powerful tool for knowledge generation and reasoning.

aagainstmethod

I'm using tools though.

aagainstmethod

Plain Text

    tool = QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name="tool0",
            description="<omitted>"
        ),
    )

    query_engine_tools: List[BaseTool] = [tool]

    agent = ReActAgent.from_tools(
        query_engine_tools,
        llm=llm,
        verbose=True,
        max_iterations=20,
    )

And the output i get is the thought chain from when it exercises the tool to answer the query instead of the final response.

aagainstmethod

the idea being i was going to add more tools later etc

LLogan M

The above is using tools (a single query engine tool)

LLogan M

as_chat_engine(chat_model="react") is the same as your code essentially, sets up a query engine tool with the index, adds it to a react agent

LLogan M

Here's a react notebook that more closely matches what you are doing, and it works the same
https://colab.research.google.com/drive/1rs0jpr0vbNocBmTYFGUsJ4S-8YdQEb5v?usp=sharing

aagainstmethod

Thanks for taking the time to look at that. I took another look adding a debugging node postprocessor that just tells me what it pulled from the index.

The stream_chat call never retrieves nodes, the chat call does. Same agent making the same query in both cases.

Plain Text

------- streaming
INFO:httpx:HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
Thought: The user is asking about main supply routes, which seems like a general question. I need to use a tool to help me answer this question.

Action: oplan
Action Input: {"input": "main supply routes", "type": "object"}

------- non-streaming
INFO:httpx:HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
Thought: The user is asking about the main supply routes. I need to use a tool to help me answer the question.
Action: oplan
Action Input: {'input': 'main supply routes', 'type': 'object'}
Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 11.57it/s]
Retrieved nodes:
  + 1625 character node
  + 821 character node
  + 1927 character node
INFO:httpx:HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"

... text for the answer here ...

aagainstmethod

I don't get the tool output generated by setting verbose=True for the streaming call, but do see it (in magenta in my term) for the non-streaming version. I don't think the tool's running.

aagainstmethod

Plain Text

    tool = QueryEngineTool.from_defaults(
        query_engine=query_engine,
        name="oplan",
        description="Provides logistical information for general operations including resources, locations, and staffing."
    )

    agent = ReActAgent.from_tools([tool], verbose=True)

    query = "What are the main supply routes?"
    for i in range(0,5):
        print("------- stream")
        agent.reset()
        res0: StreamingAgentChatResponse = agent.stream_chat(query)
        for token in res0.response_gen:
            print(token, end="", flush=True)
        print()

        print("------- non-streaming")
        agent.reset()
        res1: AgentChatResponse = agent.chat(query)
        print(res1.response)

aagainstmethod

I dug in a bit more, there is a space for some reason on the front of the chunk (e.g. Thought) when you run _infer_stream_chunk_is_final. This code obviously doesn't run with the non-streaming version of _run_step.

I changed the test on ReActAgentWorker:465 to the following and it worked:

if len(latest_content) > len("Thought") and not latest_content.strip().startswith(

Add a reply

Find answers from the community

I'm working with `ReActAgent` and when i