`Encountered exception writing response

mmaybe goats dont exist

Encountered exception writing response to history: <asyncio.locks.Event object at 0x2b5558fd0 [unset]> is bound to a different event loop

Anyone ever seen this error?

32 comments

mmaybe goats dont exist

This error ONLY seems to happen when using SummaryIndex

mmaybe goats dont exist

@Logan M https://github.com/run-llama/llama_index/issues/9678
seems to be this issue

LLogan M

I have no idea how to fix this one lol

LLogan M

You are probably using tree-summarize with the summary index right?

LLogan M

Try using aquery instead of query

mmaybe goats dont exist

Plain Text

agent_or_engine = create_summary_index_agent_from_s3_keys(
    model=model,
    user_id=user_id,
    content_s3_key_extension_pairs=content_s3_key_extension_pairs,
    history=history,
)
....
    summary_index = SummaryIndex.from_documents(
        documents, llm=LLM_INSTANCES[model], embed_model=get_embed_model()
    )
    retriever = get_retriever(
        user_id=user_id,
        model=model,
        index=summary_index,
    )
    chat_engine = CondensePlusContextChatEngine.from_defaults(
        retriever=retriever,
        llm=LLM_INSTANCES[model],
        chat_history=history,
        memory=ChatMemoryBuffer.from_defaults(  # pyright: ignore
            chat_history=history,
            token_limit=128000,  # When the memory parameter is omitted, the token limit is a small number and causes an error to be thrown.
        ),
    )

Plain Text

response = await agent_or_engine.astream_chat(message)  # pyright: ignore

mmaybe goats dont exist

this is how im using it

mmaybe goats dont exist

I dont think im using tree sujmmarize

mmaybe goats dont exist

mayube instantiating the chat memory buffer is causing an issue

mmaybe goats dont exist

also, it starts streaming some content, but doesnt finish, halfway through it encounters the issue

mmaybe goats dont exist

also this ONLY happens with claude

LLogan M

uuuuuuu very sus

LLogan M

I really don't know how to debug this 😅 Might need some google fu

mmaybe goats dont exist

I went deeper, it seems to be an issue directly related to StreamingAgentChatResponse
If i just use .chat it works
.stream_chat and .astream_chat BOTH break

I think the
astream_chat method of the Anthropic llm or the
response = await self._aclient.messages.create(
messages=anthropic_messages, system=system_prompt, stream=True, **all_kwargs
)

call are the issues, they are doing something that when the response or generator comes back its causing something ODD

mmaybe goats dont exist

I cant go deep into the anthropic client but if you have the time to do so, i think thats the issue

LLogan M

If you can reproduce with just llm.astream_chat(ChatMessage(role="user", content="Hello!")) or similar, we can probably open an issue on the anthropic github

mmaybe goats dont exist

alright ill look to do that as soon as i can

eedhenry

Seeing this on AzureOpenAI() models, with endpoints behind FastAPI, as well.

eedhenry

https://gist.github.com/edhenry/aaffc561dd7fa55bdad7f33bea005a82

eedhenry

I think it's related to the async calls being made in the astream_chat endpoints. It's causing things to happen in another event_loop relative to the FastAPI event_loop. I'm still testing.

https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/condense_question.py#L362-L365

eedhenry

Whoops. wrong chat engine link

eedhenry

https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/condense_plus_context.py#L355-L359

eedhenry

note the asyncio.run() call made within the Thread

eedhenry

I don't think this is a llamaindex problem, per se, but an integration snafu. I'll try and get some more details over the next couple hours 🙂

LLogan M

right -- I wasn't sure how else to run an async method in a thread like this 😅

mmaybe goats dont exist

@edhenry Yes I had to switch over our whole entire application to the sync methods unfortunately as I have another chance to dive deeper into the async methods to fix them

eedhenry

Some prelim testing, changing this threading call: https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/condense_plus_context.py#L355-L359

to something like:

Plain Text

if self._memory:
    asyncio.create_task(
        chat_response.awrite_response_to_history(self._memory)
    )
    chat_response._ensure_async_setup()
    await chat_response._is_function_false_event.wait()

Seems like it might fix it, but I'm still learning about asyncio, myself. Thoughts @Logan M ?

LLogan M

Will that allow you to still stream the response while its writing to history? Hard to say without trying it I suppose haha, but I would confirm that
a) streaming still works
b) the chat_history is updated properly when the streaming is complete

LLogan M

if so to both, then it seems like an acceptable fix 💪

eedhenry

a) ✅
b) ✅

eedhenry

I'll get a PR raised for this soon 🙂

LLogan M

Awesome! Thanks for ton for debugging this and trying it out -- I'll be trying this out myself when the PR is open 🙂

Add a reply

Find answers from the community

`Encountered exception writing response