Find answers from the community

Updated 7 months ago

I'm working with `ReActAgent` and when i

I'm working with ReActAgent and when i do a .chat it works as expected, but if i do .stream_chat and then .print_response_stream it streams the first chain-of-thought step text instead. Did i miss a step somewhere?
L
a
15 comments
nah thats how it works at the moment 😒 Its very hard to know when its streaming the final response
My best guess is you can filter the stream
No worries. Unfortunately it ends after compiling the first task step, so the content i'd filter for never comes. It's not a huge deal in the short term to use the static response.
Actually, I'm not able to reproduce this behaviour
Plain Text
>>> from llama_index.core import VectorStoreIndex, Document
>>> index = VectorStoreIndex.from_documents([Document.example()])
>>> chat_engine = index.as_chat_engine(chat_mode="react")
>>> response = chat_engine.stream_chat("Tell me a fact about LLMs?")
>>> for token in response.response_gen:
...   print(token, end="", flush=True)
... 
LLMs are pre-trained on large amounts of publicly available data, making them a powerful tool for knowledge generation and reasoning. 
I'm using tools though.
Plain Text
    tool = QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name="tool0",
            description="<omitted>"
        ),
    )

    query_engine_tools: List[BaseTool] = [tool]

    agent = ReActAgent.from_tools(
        query_engine_tools,
        llm=llm,
        verbose=True,
        max_iterations=20,
    )


And the output i get is the thought chain from when it exercises the tool to answer the query instead of the final response.
the idea being i was going to add more tools later etc
The above is using tools (a single query engine tool)
as_chat_engine(chat_model="react") is the same as your code essentially, sets up a query engine tool with the index, adds it to a react agent
Here's a react notebook that more closely matches what you are doing, and it works the same
https://colab.research.google.com/drive/1rs0jpr0vbNocBmTYFGUsJ4S-8YdQEb5v?usp=sharing
Thanks for taking the time to look at that. I took another look adding a debugging node postprocessor that just tells me what it pulled from the index.

The stream_chat call never retrieves nodes, the chat call does. Same agent making the same query in both cases.

Plain Text
------- streaming
INFO:httpx:HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
Thought: The user is asking about main supply routes, which seems like a general question. I need to use a tool to help me answer this question.

Action: oplan
Action Input: {"input": "main supply routes", "type": "object"}

------- non-streaming
INFO:httpx:HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
Thought: The user is asking about the main supply routes. I need to use a tool to help me answer the question.
Action: oplan
Action Input: {'input': 'main supply routes', 'type': 'object'}
Batches: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 11.57it/s]
Retrieved nodes:
  + 1625 character node
  + 821 character node
  + 1927 character node
INFO:httpx:HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"
HTTP Request: POST http://cassini:8889/v1/completions "HTTP/1.1 200 OK"

... text for the answer here ...
I don't get the tool output generated by setting verbose=True for the streaming call, but do see it (in magenta in my term) for the non-streaming version. I don't think the tool's running.
Plain Text
    tool = QueryEngineTool.from_defaults(
        query_engine=query_engine,
        name="oplan",
        description="Provides logistical information for general operations including resources, locations, and staffing."
    )

    agent = ReActAgent.from_tools([tool], verbose=True)

    query = "What are the main supply routes?"
    for i in range(0,5):
        print("------- stream")
        agent.reset()
        res0: StreamingAgentChatResponse = agent.stream_chat(query)
        for token in res0.response_gen:
            print(token, end="", flush=True)
        print()

        print("------- non-streaming")
        agent.reset()
        res1: AgentChatResponse = agent.chat(query)
        print(res1.response)
I dug in a bit more, there is a space for some reason on the front of the chunk (e.g. Thought) when you run _infer_stream_chunk_is_final. This code obviously doesn't run with the non-streaming version of _run_step.

I changed the test on ReActAgentWorker:465 to the following and it worked:

if len(latest_content) > len("Thought") and not latest_content.strip().startswith(
Add a reply
Sign up and join the conversation on Discord