Find answers from the community

Updated 2 months ago

I'm creating a LlamaIndex agent with

I'm creating a LlamaIndex agent with OpenAIAssistantAgent.from_existing(). The assistant uses OpenAI's code interpreter, and generates text and images (eg a rendered graph). I can't figure out how to set this up - I get a single text response from a .chat() call to the agent. How do I receive an image and how do I see the intermediate steps?
L
N
r
48 comments
Hmm, I'm not even sure of the OpenAIAssistent is setup to work with modalities beyond text
I'm also not sure if the intermediate steps are exposed?
Disclaimer: I've also never used openai-assistent, I'm just really familiar with the codebase
The intermediate steps , and pulling images, are exposed in the OpenAI api. Maybe it’s not possible via the llamaindex wrapper ?
I thiiiiink response.sources may have what you are looking for
At least, thats where the tool calls and their outputs get placed
Thanks I will take a look , appreciate the pointer. I’m done for the day so it’ll be tomorrow
you can also do agent.chat_history to get the chat history, but I'm not 100% what that will look like (its pulled from the openai api though)
probably also has some good details
Ah ok good ideas
OpenAI Assistant can respond with images. I think it is a llamindex agent issue. Also trying to figure this out.
The two options mentioned above don't help? I really felt like that response.sources would have it, since that contains the output of every tool call used when creating the response
result.sources is an empty list so that doesn't seem to work. result.response is a stringn starting with "The chart above...". That's the LLM's response, it thinks it made a chart (and I believe it did too)
agent.chat_history has two messages in it. They look like thetprompt I used, and the text response. The messages, as well as the agent itself, have some extra properties like "files_dict" or "file_ids" but those are empty dicts
ah I see... looking at the code, sources only gets filled in for local tools... I think?

If you are up for it, would love to see a PR to fix this somehow
The chainlit example retrieves the steps from the run, and iterates over the steps, while LlamaIndex code doesn't do that. Is it possible that you need to evolve to iterate over the steps to capture the intermediate info including images, intermediate steps, etc?
I don't know the OpenAI api well enough to know for sure.
maybe steps aren't necessary except to show intermediate progress?
I see a few problems:
1) when you convert openai messages to a LlamaIndex ChatMessage, you drop messages if they aren't MessageContentText. So you drop MessageContentImageFile, for example :
Plain Text
def from_openai_thread_message(thread_message: Any) -> ChatMessage:
    """From OpenAI thread message."""
    from openai.types.beta.threads import MessageContentText, ThreadMessage

    thread_message = cast(ThreadMessage, thread_message)

    # we don't have a way of showing images, just do text for now
    text_contents = [
        t for t in thread_message.content if isinstance(t, MessageContentText)
    ]
    text_content_str = " ".join([t.text.value for t in text_contents])
2) smaller problem, the _chat() function only returns the latest message, and it seems like the chat response is a string only, while supporting the assistant API you would want to return something like a list of messages.
Plain Text
       latest_message = self.latest_message
        # get most recent message content
        return AgentChatResponse(
            response=str(latest_message.content),
            sources=metadata["sources"],
        )
The OpenAI message call actually returns all the messages properly, including file metadata:
Plain Text
        raw_messages = self._client.beta.threads.messages.list(
            thread_id=self._thread_id, order="desc"
        )
So I guess I know how to do a PR to fix my scenario, but I don't know why the choices were made the way they were and if changing things would break anything. Can ChatMessage be extended to support OpenAI's MessageContentImageFile?
(note i just made a small edit to clarify my description of problem #1 above)
I think the OpenAIAssistant class should be marked as beta, it was written in about a day to support their release 😅 I don't see too many people using it, hence why this is probably unaddressed until now

The ChatMessage object has an additional_kwargs field that can probably be leveraged here (and I see it's already being used to fill in some ID values)
I think another option is putting the image into the sources, since we don't technically have a clear abstraction yet to support chat messages that are images

Either option would probably work though
@Logan M I think you're right that ChatMessage's additional_kwargs can go a long way to solve this. The full OpenAI message is already in there anyway.

Here's how I think you could do this:
1) add a message_type in ChatMessage to distinguish normal text message from other messages. (One alternative is that an empty content string could indicate it's not a text message and the user has to look in the additional_kwargs to parse the message)

2) stop filtering out the unknown messages in from_openai_thread_message() -- this could have more implications, so it could also be an optional parameter

3) create a new wrapper around run_assistant() that returns a list of ChatMessages
I think that makes sense to me.

A little hessitant to change things related to the ChatMessage object since it's fairly low-level and used in a lot of places, but I suppose some testing would ensure that it doesn't break anything haha
I know how to fix my use case now. But for fixing this in the library, it'd be good to get some guidance from people on the team who have a vision for how this should work. Maybe you're that person?
Also, if we want to support similar semantics as the existing chat() method, we'd have to deal with AgentChatResponse. That only holds a single string response, while it seems like it should contain a list of ChatMessages or something. Again something that could have more implications (including consumers of the event EventPayload.RESPONSE)
so i guess there are a few things to deal with to fix this properly
I think I disagree with that last point? You can always use agent.chat_history to get chat messages

Since much of llama-index is based around text, handling images is still something we are figuring out for much of the library. Maybe this requires some new ImageChatMessage object, but not sure.

In my mind, the image gets added as either a source, or as part of the message
I think even with text, it seems like an agent response should be a list of messages instead of a single message, since agents can have multiple steps
but those steps are things that inform the final response (i.e. a source)
you might be right - you're saying that in most cases you just want the final response, and if you DO want the other steps you go get the chat_history
i'm not that familiar with all the use cases of agents so that seems reasonable
Yea thats what I'm getting at. I think also the callbacks should be exposing intermediate steps as well, for applications that want to display that information in real time (I'm aware our callback coverage is not great at the moment too lol)
so then the issue is just that chat_history filters out non-text responses, and that ChatMessage doesn't have semantics for holding a non-text response
Yes, it would be great if the library could "yield" or call a callback with messages for progress steps.
technically our non-openai-assistent agents can do this 🤔 We recently separated the top-level state from running individual steps into two object classes (a runner and a worker)

https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/agent_runner.html#high-level-agent-architecture

https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/modules.html#id1
yeah i think you can be compatible with this in OpenAI
but it doesn't work that way now when using the assistants API, right ?
yea not right now
only OpenAIAgent and ReActAgent
thanks for all the input
Add a reply
Sign up and join the conversation on Discord