Multimodal react agent worker support for gemini models

MMichi

Does anyone know if there are plans to add MultimodalReActAgentWorker Support for Gemini Models?

7 comments

LLogan M

It should already be supported? I see a GeminiMultiModal class is already there

Attachment

LLogan M

pip install llama-index-multi-modal-llms-gemini

from llama_index.multi_modal_llms.gemini import GeminiMultiModal

LLogan M

(i think this is rarely used, so not entirely sure if its fully up to date 😅)

WWhiteFang_Jr

This may have caused the confusion I guess.

https://docs.llamaindex.ai/en/stable/examples/multi_modal/mm_agent/#beta-multi-modal-react-agent

Attachment

LLogan M

oh interesting, yea thats not true haha

MMichi

Thanks for the heads up! I tried to get MultimodalReActAgentWorker to work with GeminiMultiModal, but it seems the format ist not what Gemini expects and only works with openAI models right now. MMGemini model returns that it expects a blog or image but go a list

Plain Text

class MultimodalReActAgentWorker(BaseAgentWorker):

...

self._add_user_step_to_reasoning = partial(
                add_user_step_to_reasoning,
                generate_chat_message_fn=generate_openai_multi_modal_chat_message,  # type: ignore
            )

and there is this todo in the MultimodalReActAgentWorker # TODO: support gemini as well. Currently just supports OpenAI

LLogan M

Feel free to make a PR if its not working. I don't have access to gemini to test this

Add a reply

Find answers from the community

Multimodal react agent worker support for gemini models