Can I Run a Non-Quantized LLaMA Model with Ollama?
Can I Run a Non-Quantized LLaMA Model with Ollama?
At a glance
The community member is using llama-index with local language models (LLMs) and is looking for a way to run a non-quantized model with ollama. They specifically want to use the llama3.2-vision:11b model, but even the 11b-instruct-fp16 version seems too quantized for their vision task.
In the comments, another community member mentions that they swapped out the ollama model for the llama3.2 11B vision instruct turbo model on together ai, which seems to extract the contents of an image of a table consistently well, while the ollama model only extracted partial data. They plan to stick with the hosted model until they figure out the issue with ollama.
Other community members suggest that the different providers may have different default configurations for temperature or sampling parameters, and that ollama may handle images differently than together ai. They recommend creating a GitHub issue on the ollama repository to report the problem.
quick question. i'm using llama-index for my rag pipeline with local llms. is there a way to run a non quantized model with ollama? i'm hoping to run llama3.2-vision:11b but even the 11b-instruct-fp16 version seems too quantized for my vision task
Interesting. Maybe those two providers have different default configs for temperature or sampling params. Could also be ollama handles images differently than together ai π€ I know it took them quite a while to get vision support out. Might be worth a github issue on the ollama repo