Find answers from the community

Updated 2 months ago

Multimodal react agent worker support for gemini models

Does anyone know if there are plans to add MultimodalReActAgentWorker Support for Gemini Models?
L
W
M
7 comments
It should already be supported? I see a GeminiMultiModal class is already there
Attachment
image.png
pip install llama-index-multi-modal-llms-gemini

from llama_index.multi_modal_llms.gemini import GeminiMultiModal
(i think this is rarely used, so not entirely sure if its fully up to date πŸ˜…)
oh interesting, yea thats not true haha
Thanks for the heads up! I tried to get MultimodalReActAgentWorker to work with GeminiMultiModal, but it seems the format ist not what Gemini expects and only works with openAI models right now. MMGemini model returns that it expects a blog or image but go a list
Plain Text
class MultimodalReActAgentWorker(BaseAgentWorker):

...

self._add_user_step_to_reasoning = partial(
                add_user_step_to_reasoning,
                generate_chat_message_fn=generate_openai_multi_modal_chat_message,  # type: ignore
            )


and there is this todo in the MultimodalReActAgentWorker # TODO: support gemini as well. Currently just supports OpenAI
Feel free to make a PR if its not working. I don't have access to gemini to test this
Add a reply
Sign up and join the conversation on Discord