Can I Run a Non-Quantized LLaMA Model with Ollama?

At a glance

The community member is using llama-index with local language models (LLMs) and is looking for a way to run a non-quantized model with ollama. They specifically want to use the llama3.2-vision:11b model, but even the 11b-instruct-fp16 version seems too quantized for their vision task.

In the comments, another community member mentions that they swapped out the ollama model for the llama3.2 11B vision instruct turbo model on together ai, which seems to extract the contents of an image of a table consistently well, while the ollama model only extracted partial data. They plan to stick with the hosted model until they figure out the issue with ollama.

Other community members suggest that the different providers may have different default configurations for temperature or sampling parameters, and that ollama may handle images differently than together ai. They recommend creating a GitHub issue on the ollama repository to report the problem.

fflerovious

quick question. i'm using llama-index for my rag pipeline with local llms. is there a way to run a non quantized model with ollama? i'm hoping to run llama3.2-vision:11b but even the 11b-instruct-fp16 version seems too quantized for my vision task

3 comments

fflerovious

Thanks for the detailed reply.

I swapped out the ollama model to the llama3.2 11B vision instruct turbo on together ai for extracting data from an image of a table.

The hosted model seems to extract the contents consistently well while the ollama model has consistently only extracted partial data.

Guess I'll stick with the hosted model till I figure it out.

LLogan M

Interesting. Maybe those two providers have different default configs for temperature or sampling params. Could also be ollama handles images differently than together ai 🤔 I know it took them quite a while to get vision support out. Might be worth a github issue on the ollama repo

fflerovious

Yeah, I tried setting the same options. Could be the image handling differences. Time to add on to the 1.1k issues on ollama lol

Add a reply

Find answers from the community

Can I Run a Non-Quantized LLaMA Model with Ollama?