Find answers from the community

Updated 9 months ago

Error

Hi, I'm using StreamingAgentChatResponse's async_response_gen to return generated text to users through a FastAPI endpoint, and I'd like to return a 5xx status code to users when something goes wrong. However, it seems that any errors raised during requests to the LLM are getting swallowed here https://github.com/run-llama/llama_index/blob/0ee041efadeccb9884052cb393ed5e1dd7b83678/llama-index-core/llama_index/core/chat_engine/types.py#L196
I see this PR was added recently to allow reraising exceptions for synchronous calls, but nothing was added for async. https://github.com/run-llama/llama_index/pull/10407/files
Any ideas for how to work around this?
L
J
4 comments
Probably just needs a pr
Quick update on this, I tried making changes similar to PR#10407 linked above. Unfortunately, that just caused it to hang and the error didn't propagate up to the FastAPI app -- explains this comment https://github.com/run-llama/llama_index/blob/0ee041efadeccb9884052cb393ed5e1dd7b83678/llama-index-core/llama_index/core/chat_engine/types.py#L176
Guessing that's because awrite_response_to_history is called within asyncio.create_task..

I worked around the issue with something like:

await response._new_item_event.wait()
if response._aqueue.empty():
raise HTTPException(500)
# call response.async_response_gen()

It feels very hacky though since it's accessing "private" fields and relying on internal implementation details of StreamingAgentChatResponse.
If you want to debug this, I might at least print the error in the thread
It's less about debugging the exception and more about being able to return the appropriate 5xx status code to users instead of 200 status code with no content in the response body.
The error was just a ReadTimeout because the LLM server was busy serving other requests and needed to be scaled.
Currently, StreamingAgentChatResponse doesn't provide a way to know when an exception happens.
Add a reply
Sign up and join the conversation on Discord