LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

Model not offloading on GPU, I tried

Model not offloading on GPU, I tried

At a glance

The community member is having trouble getting their model to offload on the GPU, and they have tried various scripts but only the oobabooga script seems to work with the n_gpu_layers flag. The comments suggest that the community member should try compiling llama-cpp-python for their GPU, and provide specific installation instructions for Windows and Docker. After trying various approaches, the community member eventually gets the model working on the GPU, both locally and in Docker, by following the suggested installation steps.

Useful resources

·

Model not offloading on GPU, I tried many things all week, only oobabooga seems to be able to do it with n_gpu_layers, all the other scripts I tried seem to ignore it or something?

Attachment

L

B

46 comments

Did you install llama-cpp-python compiled for your GPU? Not 100% if the flags shown there indicate that

(I thought BLAS=1 should be there?)

Aah wooow I finally got it working with ctransformers. Is ctransformers compatible with llamaindex? Can I used the loaded model for llamaindex?
Or would I still need to get llamacpp working somehow?
I will check if I can compile llama-cpp-python, I just pip installed it on windows 10, could that be it?

Yea there's super specific install instructions for GPU support with llama-cpp-python

Sadly no ctransformers integration yet

If you have a cuda gpu

https://github.com/abetlen/llama-cpp-python#cublas

Or there's a few other install instructions there as well.

Thank you a lot for guiding me, ill try this once i get home. One more question in the meanwhile: i got ctransformers running through langchain and i believe i read langchain and llamaindex work together, would it be possible to build on top of this or should i import llm from llamaindex because it will not understand it?

Yea you can just pass the langchain LLM into llamaindex

Plain Text

from llama_index.llms import LangChainLLM

llm = LangChainLLM(lc_llm)

Awesomeeeeee I got it working!!! 😄
Thank you so much!
For anyone facing the same issue here is my code:

Plain Text

from langchain.llms import CTransformers
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from llama_index.llms import LangChainLLM

path="/path/to/model/mistral-11b-omnimix-bf16.Q4_K_M.gguf"
callback=[StreamingStdOutCallbackHandler()]

prompt="AI is going to"

config=config = {'max_new_tokens': 512, 'gpu_layers':50}
llm = CTransformers(model=path, callbacks=callback, config=config)

response = LangChainLLM(llm=llm(prompt))

Next issue is error:

Plain Text

llama_index/llms/langchain.py", line 111, in stream_complete
    raise ValueError("LLM must support streaming.")
ValueError: LLM must support streaming.

Changed response LangChainLLM(llm=llm) and added stream_complete:

Plain Text

response=LangChainLLM(llm=llm)

response_gen = response.stream_complete("Hi this is")
for delta in response_gen:
    print(delta.delta, end="")

Also added 'stream': True:

Plain Text

config = {'max_new_tokens': 100, 'gpu_layers': 50, 'stream': True}

Same error is seen in another script where im using chat_engine

Streaming should work for ctransformers afaik, I don't know what I am missing here, can't find anything else in the docs atm
Docs I checked:
https://python.langchain.com/docs/integrations/llms/ctransformers
https://docs.llamaindex.ai/en/stable/examples/llm/langchain.html
https://github.com/marella/ctransformers#config

try this (it's kind of a naive check on our part, langchain doesn't have a consistent pattern for this)

Plain Text

llm = CTransformers(model=path, callbacks=callback, config=config)
llm.streaming = True
...

Good try, but doesn't work for me

Plain Text

pydantic/v1/main.py", line 357, in __setattr__
    raise ValueError(f'"{self.__class__.__name__}" object has no field "{name}"')
ValueError: "CTransformers" object has no field "streaming"

Also tried llm.stream = True
Same error

also tried this, still no gpu

Attachments

Plain Text

llm = LlamaCPP(
        model_path=model_path,
        temperature=0.0,
        max_new_tokens=256,
        context_window=3900,
        generate_kwargs={},
        model_kwargs={'n_gpu_layers': 50},
        verbose=True,
    )

lameee

oh you'll need many more options on the install

do you have a cuda GPU? run this

i do

Attachment

CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

try that

or wait

on windows

$env:CMAKE_ARGS = "-DLLAMA_CUBLAS=on"
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

D:

Attachment

windooooowwss

ikr XD

idk how to set env vars in windows lol

I use WSL for everything

are you using CMD? Powershell?

cmd, i can try ps

command works on ps ^^ lemme try

oh nice

🙏

should i also do something inside docker, since im running the script from there atm

probably could also run it outside docker to try

dockerrrrrr

Yea you have to launch docker with GPUs exposed (I think its just docker run --gpus all ... )

yeah then that should be fine, i got gpu working from there before, i just mean with llama cpp python

ah yea I see -- I don't think there should be anything special beyond that 🤔

ok still no gpu in docker, let me set the script up for local

Attachment

ohhhhhhh it seems to be working locally

Attachment

damnnnnn niceeee ❤️ thanks a lot

Attachment

YESSSS docker also works after this command ❤️

no more need for ctransformers ^^

niceeee 💪

Add a reply

Sign up and join the conversation on Discord