I do have NVIDIA RTX 3060. The prime reason of using LlamaCPP was using a language model stored locally. Since we are building a RAG chatbot that will not be connected to internet at client site, it need to be bundled with all it need to run before shipping out. If i can do that with Ollama, i will never use LlamaCPP again.