Find answers from the community

Updated 8 months ago

can I somehow pass image_urls into a

can I somehow pass image_urls into a agent thats using a multimodal model?
T
m
3 comments
Which agent are you using? I think you can do something like this:

Plain Text
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls

image_urls = [
    "https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg",
]

image_documents = load_image_urls(image_urls)


Then you can pass them to a multimodal model:

Plain Text
from llama_index.multi_modal_llms.openai import OpenAIMultiModal

openai_mm_llm = OpenAIMultiModal(
    model="gpt-4-vision-preview", max_new_tokens=300
)


complete_response = openai_mm_llm.complete(
    prompt="Describe the images as an alternative text",
    image_documents=image_documents,
)


https://docs.llamaindex.ai/en/stable/examples/multi_modal/openai_multi_modal.html
I need chat not completions as well
Add a reply
Sign up and join the conversation on Discord