Find answers from the community

Updated 3 months ago

Could anyone familiar with getting

Could anyone familiar with getting Llamaindex working with Llamacpp on Macos/Apple Silicon please message me to help me with something? It has to do with getting the GPU to work.
L
d
10 comments
Just for you, I spun this up on my mac πŸ˜‰

Here's the steps

In a fresh terminal
Plain Text
python -m venv venv
source venv/bin/activate
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python --no-cache-dir --force-reinstall
pip install llama-index llama-index-llms-llama-cpp


Then, I ran this code

Plain Text
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import (
    messages_to_prompt,
    completion_to_prompt,
)

llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    model_url=model_url,
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path=None,
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": -1},
    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)


And in the terminal, I see
Plain Text
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU


I get about 20 Tokens/sec after testing with a few prompts
Wow ok thanks πŸ˜…
Lemme run it and try
And yeah thats what I'm looking for in my terminal as well
Its working thank you so much!! So in reinstalling I discovered that I installed llama-index at some point in the past so i had to go to my original python location and delete the files in the packages
But yeah that and -1 was working
I finally got the GPU to be used
Thank you for putting in the effort
As usual it was user error πŸ˜…
haha no worries! Glad to get it sorted
Add a reply
Sign up and join the conversation on Discord