Hi, I'm going crazy with the astream

At a glance

The community member is having issues with the astream_chat implementation in their code. They are getting an error "coroutine' object has no attribute 'async_response_gen'". The community members provide suggestions, such as awaiting the chat_engine.astream_chat(query) call and using the async for loop correctly. However, the community member still encounters issues, such as "Error: 'async for' requires an object with aiter method, got method".

The community member eventually finds a solution by using run_in_threadpool and wrapping the generator in an async_wrap_generator function. This allows them to use a generator as an async generator and stream the response concurrently.

kkar

Hi, I'm going crazy with the astream_chat implementation,, this is my code ''' async def generate_response_qa(
query: str, chat_session_id: str, type: str, db
) -> AsyncGenerator[str, None]:
memory, message_history, chat_memory = await get_memory_for_session(chat_session_id, db)

chat_engine = SimpleChatEngine.from_defaults(memory=chat_memory, llm=QA_LLM_MODEL, prefix_messages=[])
response = chat_engine.astream_chat(query)

result = ""
async for token in response.async_response_gen:
result += token
yield token ''' . Each time I'm running a request I have Error: 'coroutine' object has no attribute 'async_response_gen' What I'm doing wrong here. BTW I'm using llama-index==0.10.51, Any help is welcome. Thank you.

19 comments

LLogan M

From the error, you never awaited

LLogan M

Plain Text

    response = await chat_engine.astream_chat(query)

    result = ""
    async for token in response.async_response_gen():
        result += token
        yield token

LLogan M

that should work

kkar

Hi, I also tried it but gives "Error: 'async for' requires an object with aiter method, got method". 😦

LLogan M

Copy my example

LLogan M

you also missed the brackets

LLogan M

async for token in response.async_response_gen():

LLogan M

()

LLogan M

🙂

kkar

Plain Text

async def generate_response_qa(
    query: str, chat_session_id: str, type: str, db
) -> AsyncGenerator[str, None]:
    memory, message_history, chat_memory = await get_memory_for_session(chat_session_id, db)

    chat_engine = SimpleChatEngine.from_defaults(memory=chat_memory, llm=QA_LLM_MODEL, prefix_messages=[])
    response = await chat_engine.astream_chat(query)

    result = ""
    async for token in response.async_response_gen():
        result += token
        yield token

Works, but is not streaming. At this stage the only solution I see is to build a wrap around the Generator to have an AsyncGenerator.

LLogan M

I'm not sure what you mean but its not streaming? It works for me

LLogan M

this is an async generator no?

kkar

Yes, it is an async generator and I have a Server SIde Event (SSE). If I use stream_chat at the endpoint, UI, I see the content streamed, token by token, but with astream_chat I receive all the content in one time, not token by token. This is not doing any sense. If is working for you so the problem is somewhere in my code after this

Plain Text

python 
    async for token in response.async_response_gen():
        result += token
        yield token

. I understand the await at _await chat_engine.astreamchat(query) and the _async_responsegen() but in none example from Llamaindex documentation I see the await before chat_engine.astream_chat. Maybe I'm already too tired.

LLogan M

I think the issue is probably on the receiving side of this function

LLogan M

If I had to guess

kkar

It is a possibility and maybe an issue from Ollama 0.2.1. with async. As soon I will find the issue I will share it. Thanks.

kkar

Hi, didn't find the issue but I have the solution for my case, where I'm using FastAPI:

Plain Text

async def generate_response_qa(
    query: str, chat_session_id: str, type: str, db
) -> AsyncGenerator[str, None]:
    chat_memory = await get_memory_for_session(chat_session_id, db)

    chat_engine = SimpleChatEngine.from_defaults(memory=chat_memory, llm=QA_LLM_MODEL, prefix_messages=[])

    response = await run_in_threadpool(chat_engine.stream_chat, query)

    result = ""
    async for token in async_wrap_generator(response.response_gen):
        result += token
        yield token

where:

Plain Text

async def async_wrap_generator(sync_gen: Generator[Any, None, None]) -> AsyncGenerator[Any, None]:
    for value in sync_gen:
        await asyncio.sleep(0)  # Yield control to the event loop
        yield value

With this approach I can use a Generator as async and concurrently with _run_inthreadpool.

LLogan M

thats pretty jank lol but works for me

kkar

Sometimes desperation leads to desperate deployment. 🙂 Thank you for your time.

Add a reply

Find answers from the community

Hi, I'm going crazy with the astream_