Quick update on this, I tried making changes similar to PR#10407 linked above. Unfortunately, that just caused it to hang and the error didn't propagate up to the FastAPI app -- explains this comment
https://github.com/run-llama/llama_index/blob/0ee041efadeccb9884052cb393ed5e1dd7b83678/llama-index-core/llama_index/core/chat_engine/types.py#L176Guessing that's because awrite_response_to_history is called within asyncio.create_task..
I worked around the issue with something like:
await response._new_item_event.wait()
if response._aqueue.empty():
raise HTTPException(500)
# call response.async_response_gen()
It feels very hacky though since it's accessing "private" fields and relying on internal implementation details of StreamingAgentChatResponse.