LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

Hi, just a quick question, does indexing

Hi, just a quick question, does indexing

At a glance

ddigital_dream64

·

(Edit) Unable to use GPU on M1 Max MacBook Pro. Recompiled llama-cpp-python with metal support as per docs but no joy. Any help is appreciated.

Testing llama.cpp separately uses gpu

Hi, just a quick question, does indexing use only CPU? Is there a way to accelerate using GPU? I'm on Apple Silicon and I'm only seeing CPU usage while indexing multiple pdfs so just wondering if I'm doing anything wrong.

W

d

L

62 comments

I think somewhere logan mentioned this that if GPU is present, Embedding model will use GPU directly.

ddigital_dream64

hmm weird... im getting high cpu usage

ddigital_dream64

what am i supposed to set the n kwargs gpu layer to?

ddigital_dream64

1 or 0?

ddigital_dream64

Ok i can confirm that even when inferencing, its not using gpu

ddigital_dream64

it was working before, not sure what I changed. Steps to troubleshoot? I'm gonna play with the n layers and also double check the llm settings

ddigital_dream64

ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil

ddigital_dream64

Thats from the verbose output

I work with windows and ubuntu machine, lol so cant be of any help with Mac 😅

ddigital_dream64

haha np

ddigital_dream64

I'll pose the question on the main thread then

ddigital_dream64

thank you!

I think, you can wiat for Logan

ddigital_dream64

Sure I can do that np

ddigital_dream64

I'll give it a few hours

yeah sure 💪

ddigital_dream64

Just as an update, I just tested my regular install of llama.cpp and the GPU usage goes up to 75%

ddigital_dream64

so its definitely me having done somethign wrong

ddigital_dream64

I suspect it occured when i tried my fresh install of llama-index

ddigital_dream64

looks like i need to clean and try again later

Ah great! That's a good sign

ddigital_dream64

Ok so i've fully removed llama-index and llama-cpp-python from my virtual environment

ddigital_dream64

I suppose I need to ask if there are specific instructions to be followed for metal to work

ddigital_dream64

The documentation on the page is outdated as it talks about llama-cpp-python version 1.6 whereas when i compiled and installed it, it was on 2.6

ddigital_dream64

yeah i'm unable to get this to work

(I would just use ollama low-key, llama-cpp is a nightmare)

ddigital_dream64

Interesting ok lemme try that

ddigital_dream64

It’s just the implementation of llama-cpp through llama index that’s not working though

ddigital_dream64

Regular llama-cpp works fine

ddigital_dream64

I’ll try using ollama but it’s documentation to be used in conjunction with llama index feels like it’s only being used to point at a server that’s already running. I wanted to run a single instance in each script without running the server

ddigital_dream64

Should I use a different llm integration?

ddigital_dream64

@Logan M is there any way for me to get llama-cpp working in llamaindex?

ddigital_dream64

Please let me know if tagging you is not allowed

Ollama is a server yes, it's just way easier to configure compared to llama.cpp. tbh I much prefer using it, but that's just me

I'm not a llama cpp expert. I just know there's super specific installation instructions, plus you have to set n_gpu_layers to -1 or some other non zero value

ddigital_dream64

I understand no problem

ddigital_dream64

Yeah the gpu layers is specific and weird for sure

ddigital_dream64

i think you have to set it to 1

ddigital_dream64

also I may have somewhat figured out where the problem is coming from, where/who should i speak to incase its a bug? I'm not used to submitting bugs and requests, it will probably be my first time.

ddigital_dream64

I think the llama-index llama cpp utils are not updated to use the gpu specific version of llama-cpp

ddigital_dream64

or something like that

Like, I know llama-cpp-python has specific instructions for installing on metal

CMAKE_ARGS="-DLLAMA_METAL=on" pip install -U llama-cpp-python --no-cache --force-reinstall

something like that

ddigital_dream64

Yup it does

ddigital_dream64

I tried it multiple times

ddigital_dream64

Finally, I went backwards

ddigital_dream64

I did the llama cpp install first

ddigital_dream64

Then llama index

ddigital_dream64

And then the code didn’t recognize the llm=LLAMACPP

ddigital_dream64

Until I manually installed llama-index-LLMs-llama-cpp

ddigital_dream64

So clearly the llamacpp parameters are being taken by something in llamaindex but it’s not for whatever reason able to hook into regular llama-cpp-python in a way which runs it in GPU form

The code under the hood isn't doing anything special on top of llama-cpp-python

https://github.com/run-llama/llama_index/blob/0ae69d46e3735a740214c22a5f72e05d46d92635/llama-index-integrations/llms/llama-index-llms-llama-cpp/llama_index/llms/llama_cpp/base.py#L120

I really would just use ollama and figure out the server thing. This isn't worth the headache to run an LLM at 20 Tokens/Second lol

ddigital_dream64

Ok I'll take a look at the link you've sent as well then.

ddigital_dream64

I hope you don't mind that I'm fixated on Llamacpp... as far as I know Ollama doesn't work with my use case which is that each script will run a different LLM each time. I'll attempt to ask about it on the main group again if that's alright. Thanks for your help thus far.

ddigital_dream64

I look a look at the link you sent and interestingly there's no field for n_gpu_layers in that

ddigital_dream64

i think that parameter isn't getting passed through to llamacpp

n_gpu_layers is passed in with model_kwargs

Plain Text

llm = LlamaCPP(
    ...
    model_kwargs={"n_gpu_layers": -1},
)

ddigital_dream64

I think its a positive 1 if I'm not wrong... I'll try the negative 1 just to be sure

ddigital_dream64

I'm going off the documentation

-1 will offload all layers to GPU

1 will do just one

Add a reply

Sign up and join the conversation on Discord