Find answers from the community

Updated 2 months ago

Return direct

Hi all, I was wondering if some of you worked with ReactAgents. I have tried the return_direct argument. In the backend, I can see the "Observation" from the agent that contains the final answer. I have then to wait to have the answer streamed in the front (the exact same answer). Do you understand why?

Plain Text
 top_level_sub_tools = [
    QueryEngineTool.from_defaults(
        query_engine=qualitative_question_engine,
        name="qualitative_question_engine",
        description="""
            A query engine that can answer qualitative questions about documents
            """.strip(),
        return_direct=True,
    ),
    ]

    chat_engine = ReActAgent.from_tools(
        tools=top_level_sub_tools,
        llm=chat_llm,
        chat_history=chat_history,
        verbose=True,
        callback_manager=Settings.callback_manager,
        max_function_calls=1,
    )
    
L
B
5 comments
That's exactly what return direct does. When that tool is called, it's output is returned directly to the user.

It has to be streamed, because you are calling stream chat
Thanks for your answer, yes I'm using astream_chat, I might misunderstand something, the answer is already received in the back (in Observation), and then the streaming in the front is kind of slow (3 tokens/s), I'm using SSE, the same code than in sec-insights
Yea probably this sleep for the dummy strwam should be faster
https://github.com/run-llama/llama_index/blob/723c2533ed4b7b43b7d814c89af1838f0f1994c2/llama-index-core/llama_index/core/chat_engine/types.py#L92

But also, the response is technically fully there, you could just check for it and return the whole thing or fake-stream it yourself
Thanks for pointing me in the right direction, I'll have a look at it!
I've tried something like that that is working to patch the delay.
Maybe not the most elegant way but it works to make some tests.
Add a reply
Sign up and join the conversation on Discord