Find answers from the community

Updated 4 weeks ago

Can I Run a Non-Quantized LLaMA Model with Ollama?

quick question. i'm using llama-index for my rag pipeline with local llms. is there a way to run a non quantized model with ollama? i'm hoping to run llama3.2-vision:11b but even the 11b-instruct-fp16 version seems too quantized for my vision task
f
L
3 comments
Thanks for the detailed reply.

I swapped out the ollama model to the llama3.2 11B vision instruct turbo on together ai for extracting data from an image of a table.

The hosted model seems to extract the contents consistently well while the ollama model has consistently only extracted partial data.

Guess I'll stick with the hosted model till I figure it out.
Interesting. Maybe those two providers have different default configs for temperature or sampling params. Could also be ollama handles images differently than together ai πŸ€” I know it took them quite a while to get vision support out. Might be worth a github issue on the ollama repo
Yeah, I tried setting the same options. Could be the image handling differences. Time to add on to the 1.1k issues on ollama lol
Add a reply
Sign up and join the conversation on Discord