hi 👋. we want to provide users of our

At a glance

hi 👋. we want to provide users of our product with more information when things go wrong in a chat. for example, if they trigger content filtering by a provider like Azure, we want to tell them that.

we're using streaming and a variety of agents and engines to support chat, including OpenAIAgent, ReActAgent, SimpleChatEngine, and CondensePlusContextChatEngine.

when content filters get triggered, I see the following warning. however, the response I get is empty. is there something I can do to get the error message itself? I've even tried accumulating the streaming response like I normally do in successful calls, but I'm not seeing anything.

Plain Text

2024-04-15T18:52:19.426862Z [warning  ] Encountered exception writing response to history: Error code: 400 - {'error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400, 'innererror': {'code': 'ResponsibleAIPolicyViolation', 'content_filter_result': {'hate': {'filtered': True, 'severity': 'high'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}}} [llama_index.core.chat_engine.types]

20 comments

LLogan M

I think you'd have to intercept the LLM event, using instrumentation

Specifically, you can capture the SpanDropEvent in a custom event handler
https://docs.llamaindex.ai/en/latest/examples/instrumentation/basic_usage/?h=instrumentation

BBrandon

hey logan! wouldn't instrumentation be more for observability purposes? the need here is to be able to synchronously tell the user that something went wrong with their request. would you still suggest using instrumentation for that?

LLogan M

instrumentation can definititely be used for that. I don't know how else I would intercept this

BBrandon

hm, I see. I'm wondering if it'd make more sense for the error to be thrown instead of being logged. the error status code is 400, which is a bad request. wouldn't it make sense for this error to be treated like any other 4xx error, e.g. an invalid model name being passed in?

LLogan M

Since its happening in a thread, its pretty complicated

LLogan M

We spin up a thread in order to write to history, and also expose a generator at the same time

LLogan M

Would love to clean this logic up, if you are interested in contributing

BBrandon

I could potentially work with Sammy on this since he's already contributed to LlamaIndex. I guess what I'm confused about is why we even need to worry about the history thread in the first place. is it not possible to throw the error before writing to history? is the idea that we only see the error upon writing to history because we're streaming?

BBrandon

and even with the history thread issue, I'm confused about why I'm unable to see the same error in the response itself. the same chat_response that's used to write to history is returned to the caller.

LLogan M

So, look at it this way

llm.stream_chat() gets called

We need to interate over this, until we know one of two things
a) Is there a tool cool? Ok cool, exhaust the stream, call the tool, continue the loop
b) there's no tool call? Ok great, we have to give the user a generator

The hard part here is we need to expose a generator to the user, as well as save the message to the chat history once the stream is exhaused

So, a thread is spun up, that both exposes a generator and write to history upon completion.

Because of how this logic is split up, the generator has no idea if an error ever gets raised

LLogan M

Its confusing, I know, but there's a reason it is the way it is 😅

LLogan M

Here's a more concrete example

Here, a streaming response is returned
https://github.com/run-llama/llama_index/blob/ce5963486d8b99fa045659dbdb7bd1398d6b7da5/llama-index-core/llama_index/core/chat_engine/context.py#L208

If we look under the hood of what StreamingAgentChatResponse does, we can see the thread that iterates over the stream
https://github.com/run-llama/llama_index/blob/ce5963486d8b99fa045659dbdb7bd1398d6b7da5/llama-index-core/llama_index/core/chat_engine/types.py#L146

And the generator that is exposed to the user
https://github.com/run-llama/llama_index/blob/ce5963486d8b99fa045659dbdb7bd1398d6b7da5/llama-index-core/llama_index/core/chat_engine/types.py#L244

BBrandon

ok I worked with Sammy, and I'm now caught up on why it is how it is. what would be the best approach to clean this up? also, it seems a little odd to us that the actual completion call is triggred by saving the message to chat history instead of when the user uses the generator.

LLogan M

Its triggered that way because we can't return to the user until we know for sure its not calling tools (this is really only relevant for agents, other chat engines followed that pattern for simplicity)

I'm not sure what can be done to clean this up actually 😅 I've been meaning to take a stab at it, but every time I can't think of a way to get rid of the threading that works nicely

BBrandon

ah man. I'm going to have to come up with an interim solution for letting users know about content filtering, but I'd be down to contribute. if you happen to come up with a good solution and just need it implemented, let me know!

NNRack

@Logan M Hi! Wanted to follow-up on this, and see if it's possible to get this prioritized

Since this essentially obfuscates all completion error message at runtime, we're unable to provide our production users with a valuable message that informs them what is actually happening

This is a huge issue for us, and is making us consider moving at least partially off LlamaIndex since it's really hurting our product experience and affecting our customer retention

LLogan M

https://github.com/run-llama/llama_index/pull/13160

LLogan M

I really recommend just raising PRs for things that are issues. I get a huge amount of asks on discord/github, it can be hard to prioritize. but the beauty of open-source is just being able to fix things on your own time too 👍

LLogan M

jk on that fix -- even raising the error in a thread doesn't propagate to the parent context. Need some sort of message passing system to get it to bubble up properly

LLogan M

ok, that was easier to fix than I thought

Add a reply

Find answers from the community

hi 👋. we want to provide users of our