hello. I am using LLI for a chatbot /query engine using a Llama.cpp local 7b parameters model. each generation takes about 1.5 minutes. is that normal? I have 32 GB system ram, 4gbvram. How does one estimate how much resources are needed? what are the factors that improve the speed of inference. The data it is indexing is just three text files. I remember things being faster earlier.