quick question. i'm using llama-index for my rag pipeline with local llms. is there a way to run a non quantized model with ollama? i'm hoping to run llama3.2-vision:11b but even the 11b-instruct-fp16 version seems too quantized for my vision task
Interesting. Maybe those two providers have different default configs for temperature or sampling params. Could also be ollama handles images differently than together ai π€ I know it took them quite a while to get vision support out. Might be worth a github issue on the ollama repo