Find answers from the community

Updated 2 months ago

Vision

At a glance

The community members are discussing the best vision model that performs as well as GPT-4, which is an LLM with a vision modality. Some suggestions include llama3.1/2 and the multimodal version, as well as closed-source models like sonnet and gemini. One community member mentions qwen but is unsure about its multimodal capabilities. There is also a discussion about the reliability of academic benchmarks, with one community member expressing skepticism. Another community member shares their experience with OpenAI's vision model, noting that it performs well on a document with a lot of numbers. Finally, a community member expresses interest in a similar model that can run locally.

What is the best vision model currently that performs as well as gpt4o? That is an LLM with vision modality.
L
t
7 comments
Like open source? Probably llama3.1/2 or whatever the multimodal one is. Although it will be no where close to openai lol

Closed source, sonnet and gemini are very good
Hi!

Either as long as they can run locally.
What about qwen
Haven't heard much about qwen's multimodal capability tbh

I wouldn't trust academic benchmarks too much. I feel like most models these days are overfitting to the benchmarks out there
oh, really? That's interesting.
with OAI vision for a 8000 character document (with lot's of numbers), I am getting 1 character wrong
i'd like a similar model but running locally
Add a reply
Sign up and join the conversation on Discord