Find answers from the community

Updated 3 months ago

Any idea why this is taking 18 minutes

Any idea why this is taking 18 minutes for a simple query? Im using Mistral through Ollama locally.
Attachment
image.png
W
a
5 comments
Are you running the model on GPU?
After loading the model how much space is left as llm model tends to beef up the space while generating tokens. so if less space is there the process will take more time to generate
Nope im running it on a normal laptop with 16 GB RAM, its taking 100% of the RAM and GPU so I am assuming the laptop is the reason this is happening and running it on a GPU should make the response time normal?
Yeah CPU is not good for running llm and that too big LLM is a big no so far.
Try running this on colab
but im using Ollama for the models. How would that work on colab?
Ollama allows you to run models in colab. I think you can find more on their GitHub repo or simple Google search
Add a reply
Sign up and join the conversation on Discord