Find answers from the community

Updated 7 months ago

why does it take 15-20 seconds to get an

why does it take 15-20 seconds to get an answer when I use ollama(llama3) with llama_index?
L
m
4 comments
becuase running a local LLM (especially when using the full context window) takes a lot of resources and can be slow depending on your hardware and what libraries you use to run the model
I expected that something that takes no time in the terminal should not take any time in the .py file
The bigger the input, the longer it takes
I doubt in the terminal you are pasting I'm several paragraphs as an input
Add a reply
Sign up and join the conversation on Discord