Error

At a glance

The community member is using StreamingAgentChatResponse's async_response_gen to return generated text to users through a FastAPI endpoint. They would like to return a 5xx status code to users when something goes wrong, but it seems that any errors raised during requests to the LLM are getting swallowed. The community member has seen a recent PR that allows reraising exceptions for synchronous calls, but nothing was added for async. They are looking for ideas on how to work around this issue.

In the comments, another community member suggests that the issue probably just needs a PR. Another community member tried making changes similar to the PR mentioned, but that caused the app to hang and the error didn't propagate up to the FastAPI app. They worked around the issue by accessing "private" fields and relying on internal implementation details of StreamingAgentChatResponse, but they acknowledge that this feels very hacky.

Another community member suggests printing the error in the thread, but the original poster clarifies that the issue is less about debugging the exception and more about being able to return the appropriate 5xx status code to users instead of a 200 status code with no content in the response body. The error was a ReadTimeout because the LLM server was busy serving other requests and needed to be scaled. The community member notes that currently, StreamingAg

Useful resources

JJG

Hi, I'm using StreamingAgentChatResponse's async_response_gen to return generated text to users through a FastAPI endpoint, and I'd like to return a 5xx status code to users when something goes wrong. However, it seems that any errors raised during requests to the LLM are getting swallowed here https://github.com/run-llama/llama_index/blob/0ee041efadeccb9884052cb393ed5e1dd7b83678/llama-index-core/llama_index/core/chat_engine/types.py#L196
I see this PR was added recently to allow reraising exceptions for synchronous calls, but nothing was added for async. https://github.com/run-llama/llama_index/pull/10407/files
Any ideas for how to work around this?

4 comments

LLogan M

Probably just needs a pr

JJG

Quick update on this, I tried making changes similar to PR#10407 linked above. Unfortunately, that just caused it to hang and the error didn't propagate up to the FastAPI app -- explains this comment https://github.com/run-llama/llama_index/blob/0ee041efadeccb9884052cb393ed5e1dd7b83678/llama-index-core/llama_index/core/chat_engine/types.py#L176
Guessing that's because awrite_response_to_history is called within asyncio.create_task..

I worked around the issue with something like:

await response._new_item_event.wait()
if response._aqueue.empty():
raise HTTPException(500)
# call response.async_response_gen()

It feels very hacky though since it's accessing "private" fields and relying on internal implementation details of StreamingAgentChatResponse.

LLogan M

If you want to debug this, I might at least print the error in the thread

JJG

It's less about debugging the exception and more about being able to return the appropriate 5xx status code to users instead of 200 status code with no content in the response body.
The error was just a ReadTimeout because the LLM server was busy serving other requests and needed to be scaled.
Currently, StreamingAgentChatResponse doesn't provide a way to know when an exception happens.

Add a reply

Find answers from the community

Error