Seems like either an issue with cuda, or you didn't properly install vllm? (If you are in a notebook, you sometimes need to restart after installing packages)
and i went through llama.cpp demo, althrough there was warning, the code ran and result was generated
It can be different for different systems
i never manually configured them, i am on nixos, it was automatically enabled from configuration.nix, and i tested it with ollama and llama.cpp, both ran without error, usually on ollama, CPU/GPU split is about 85/15
which is weird, but things are running as intended, I havent tried gpu layering with llama.cpp yet, but i will sometimes
you are right, i might have to ditch llama-index for a bit just to test vllm
i think i was only supposed to be runing pip install llama-index-llms-vllm
but i ran pip install vllm
before that
it triggers circular import, but oh my god, i was follow exact what post says,
There's two modes of using vLLM local and remote. Let's start form the former one, which requeries CUDA environment availabe locally.
Install vLLM
pip install vllm
or if you want to compile you can compile from source
Orca-7b Completion Example
%pip install llama-index-llms-vllm
import os
os.environ["HF_HOME"] = "model/"
from llama_index.llms.vllm import Vllm
llm = Vllm(
model="microsoft/Orca-2-7b",
tensor_parallel_size=4,
max_new_tokens=100,
vllm_kwargs={"swap_space": 1, "gpu_memory_utilization": 0.5},
)
may I ask which operating system you are running on? i mean, it is rather irrelevent since you tested on colab but i am very curious.
as things turned out, it does not really have anything to do with llama-nidex, since it is essentially just a wrapper to develop on local llm.
It has everything to do with how my python environment is set up with cuda enabled, not required by llama-index in anyways, but i do need them for torch and transformers.
and because i am using nixos, so things can be a little more involved. I dont believe other linux will be necessarily easier. but this thing is tricky to setup especially when you want to enable cuda as well.
I think macos is the easist to work with? maybe? but right now this linux machine is all i have. so ye...
not you guys' fault, my bad because i am a big NOOB 🤣
ha no worries. Yea I'm running on macos locally, and google colab is some linux runtime I think. Installing cuda stuff/torch on linux is definitely a pain