Hello, I have a general kind of question, when using local models, what do you guys would be the best choice for the inference engine to use (free)? since llama-index supports vllm and ollama and llama cpp and even huggingface transformers, and alot of other integrations.
this is my setup:
- 2 Nvidia Quadro P4000 GPUs each 8gb of VRAM (which makes in total 16gb of VRAM)
- intel Xeon 3.70 Ghz
- 32 gbs of RAM
model i'm trying to use is mistral 7b-instruct-v0.1