Cuda

At a glance

ImportError: Could not import vllm python package. Please install it with pip install vllm

.
/home/alice7/.conda/lib/python3.12/site-packages/torch/cuda/__init__.py:129: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0

22 comments

LLogan M

Seems like either an issue with cuda, or you didn't properly install vllm? (If you are in a notebook, you sometimes need to restart after installing packages)

llucaswillkill

no i am on conda,

llucaswillkill

and i went through llama.cpp demo, althrough there was warning, the code ran and result was generated

llucaswillkill

as for the cuda,

LLogan M

Check out the vllm install docs
https://docs.vllm.ai/en/stable/index.html

LLogan M

It can be different for different systems

llucaswillkill

i never manually configured them, i am on nixos, it was automatically enabled from configuration.nix, and i tested it with ollama and llama.cpp, both ran without error, usually on ollama, CPU/GPU split is about 85/15

llucaswillkill

which is weird, but things are running as intended, I havent tried gpu layering with llama.cpp yet, but i will sometimes

llucaswillkill

you are right, i might have to ditch llama-index for a bit just to test vllm

llucaswillkill

i think i was only supposed to be runing pip install llama-index-llms-vllm

llucaswillkill

but i ran pip install vllm before that

llucaswillkill

it triggers circular import, but oh my god, i was follow exact what post says,

llucaswillkill

There's two modes of using vLLM local and remote. Let's start form the former one, which requeries CUDA environment availabe locally.
Install vLLM

pip install vllm
or if you want to compile you can compile from source
Orca-7b Completion Example

%pip install llama-index-llms-vllm

import os

os.environ["HF_HOME"] = "model/"

from llama_index.llms.vllm import Vllm

llm = Vllm(
    model="microsoft/Orca-2-7b",
    tensor_parallel_size=4,
    max_new_tokens=100,
    vllm_kwargs={"swap_space": 1, "gpu_memory_utilization": 0.5},
)

LLogan M

worked totally fine for me on colab (although most of those kwargs are for multiple GPUs, might wanna remove them)
https://colab.research.google.com/drive/1tOGscuFzeJdfvXTX4KgARBB1-kUsGSCV?usp=sharing

(ok, it crashes because I was using a T4 with not enough memory, but got far enough)

llucaswillkill

may I ask which operating system you are running on? i mean, it is rather irrelevent since you tested on colab but i am very curious.

llucaswillkill

as things turned out, it does not really have anything to do with llama-nidex, since it is essentially just a wrapper to develop on local llm.

llucaswillkill

It has everything to do with how my python environment is set up with cuda enabled, not required by llama-index in anyways, but i do need them for torch and transformers.

llucaswillkill

and because i am using nixos, so things can be a little more involved. I dont believe other linux will be necessarily easier. but this thing is tricky to setup especially when you want to enable cuda as well.

llucaswillkill

but it is possible.

llucaswillkill

I think macos is the easist to work with? maybe? but right now this linux machine is all i have. so ye...

llucaswillkill

not you guys' fault, my bad because i am a big NOOB 🤣

LLogan M

ha no worries. Yea I'm running on macos locally, and google colab is some linux runtime I think. Installing cuda stuff/torch on linux is definitely a pain

Add a reply

Find answers from the community

Cuda