Find answers from the community

Updated 2 months ago

Model not offloading on GPU, I tried

Model not offloading on GPU, I tried many things all week, only oobabooga seems to be able to do it with n_gpu_layers, all the other scripts I tried seem to ignore it or something?
Attachment
image.png
L
B
46 comments
Did you install llama-cpp-python compiled for your GPU? Not 100% if the flags shown there indicate that
(I thought BLAS=1 should be there?)
Aah wooow I finally got it working with ctransformers. Is ctransformers compatible with llamaindex? Can I used the loaded model for llamaindex?
Or would I still need to get llamacpp working somehow?
I will check if I can compile llama-cpp-python, I just pip installed it on windows 10, could that be it?
Yea there's super specific install instructions for GPU support with llama-cpp-python

Sadly no ctransformers integration yet
Or there's a few other install instructions there as well.
Thank you a lot for guiding me, ill try this once i get home. One more question in the meanwhile: i got ctransformers running through langchain and i believe i read langchain and llamaindex work together, would it be possible to build on top of this or should i import llm from llamaindex because it will not understand it?
Yea you can just pass the langchain LLM into llamaindex

Plain Text
from llama_index.llms import LangChainLLM

llm = LangChainLLM(lc_llm)
Awesomeeeeee I got it working!!! πŸ˜„
Thank you so much!
For anyone facing the same issue here is my code:
Plain Text
from langchain.llms import CTransformers
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from llama_index.llms import LangChainLLM

path="/path/to/model/mistral-11b-omnimix-bf16.Q4_K_M.gguf"
callback=[StreamingStdOutCallbackHandler()]

prompt="AI is going to"

config=config = {'max_new_tokens': 512, 'gpu_layers':50}
llm = CTransformers(model=path, callbacks=callback, config=config)

response = LangChainLLM(llm=llm(prompt))
Next issue is error:
Plain Text
llama_index/llms/langchain.py", line 111, in stream_complete
    raise ValueError("LLM must support streaming.")
ValueError: LLM must support streaming.

Changed response LangChainLLM(llm=llm) and added stream_complete:
Plain Text
response=LangChainLLM(llm=llm)

response_gen = response.stream_complete("Hi this is")
for delta in response_gen:
    print(delta.delta, end="")

Also added 'stream': True:
Plain Text
config = {'max_new_tokens': 100, 'gpu_layers': 50, 'stream': True}

Same error is seen in another script where im using chat_engine

Streaming should work for ctransformers afaik, I don't know what I am missing here, can't find anything else in the docs atm
Docs I checked:
https://python.langchain.com/docs/integrations/llms/ctransformers
https://docs.llamaindex.ai/en/stable/examples/llm/langchain.html
https://github.com/marella/ctransformers#config
try this (it's kind of a naive check on our part, langchain doesn't have a consistent pattern for this)

Plain Text
llm = CTransformers(model=path, callbacks=callback, config=config)
llm.streaming = True
...
Good try, but doesn't work for me
Plain Text
pydantic/v1/main.py", line 357, in __setattr__
    raise ValueError(f'"{self.__class__.__name__}" object has no field "{name}"')
ValueError: "CTransformers" object has no field "streaming"

Also tried llm.stream = True
Same error
also tried this, still no gpu
Attachments
image.png
image.png
Plain Text
llm = LlamaCPP(
        model_path=model_path,
        temperature=0.0,
        max_new_tokens=256,
        context_window=3900,
        generate_kwargs={},
        model_kwargs={'n_gpu_layers': 50},
        verbose=True,
    )
oh you'll need many more options on the install
do you have a cuda GPU? run this
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
$env:CMAKE_ARGS = "-DLLAMA_CUBLAS=on"
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
idk how to set env vars in windows lol
I use WSL for everything
are you using CMD? Powershell?
cmd, i can try ps
command works on ps ^^ lemme try
should i also do something inside docker, since im running the script from there atm
probably could also run it outside docker to try
dockerrrrrr

Yea you have to launch docker with GPUs exposed (I think its just docker run --gpus all ... )
yeah then that should be fine, i got gpu working from there before, i just mean with llama cpp python
ah yea I see -- I don't think there should be anything special beyond that πŸ€”
ok still no gpu in docker, let me set the script up for local
Attachment
image.png
ohhhhhhh it seems to be working locally
damnnnnn niceeee ❀️ thanks a lot
YESSSS docker also works after this command ❀️
no more need for ctransformers ^^
niceeee πŸ’ͺ
Add a reply
Sign up and join the conversation on Discord