Find answers from the community

Updated 3 months ago

Hello! When using OpenAIMultiModal.

Hello! When using OpenAIMultiModal.complete, can we specify the output class? I pass it like this:

Plain Text

image_llm = OpenAIMultiModal(
                model='gpt-4o',
                output_cls=output_class,
                api_key=api_key,
                max_new_tokens=1000,
                temperature=0.0,
            ) 
image_doc = load_image_urls([image_url])

response_vision = image_llm.complete(
            prompt=prompt,
            image_documents=image_doc,
        )

It doesn't throw any exception but the response doesn't have any output class instance either. Thanks!

4 comments

LLogan M

No you cant

This guide might be helpful
https://docs.llamaindex.ai/en/stable/examples/multi_modal/ollama_cookbook/#structured-data-extraction-from-images

SSeaCat

Thank you!

SSeaCat

Interesting... is it possible to implement the same example, but with OpenAIMultivmodal?

SSeaCat

Cool, I was able to do what I needed, with OpenAI and this approach. Many thanks!!

Add a reply