Find answers from the community

Updated last year

i had this working using the openAI api

i had this working using the openAI api, but now want to move to using a local model - i see different references in the docs so wasn't sure exactly how to point to the local model. I think I'm supposed to set my service_context to 'local' but it's not clear how I then point it to the specific model. This is what I have now... Advice?

#specify a local model
from llama_cpp import llama, Llama
from llama_index import ServiceContext
llm = Llama(model_path=r'C:\Users\erraballiv\Downloads\llama-2-7b.Q5_K_S.gguf')
service_context = ServiceContext.from_defaults(llm='local')



storage_context = StorageContext.from_defaults(persist_dir=r"C:\Users\erraballiv\PycharmProjects\Ll-index-ex1")
loaded_index = load_index_from_storage(storage_context)
query_engine = loaded_index.as_query_engine()
L
r
16 comments
setting it to local will load the default local model (llama-2-chat-13b) with llama-cpp

To use another model, you can setup the actuall llama-cpp LLM like this:
https://gpt-index.readthedocs.io/en/stable/examples/llm/llama_2_llama_cpp.html

Amd then pass the LLM into the service context

Note that there's a small bug in the code at the moment from a fix from yesterday. Cutting a new release shortly, so maybe hold off if you have 0.8.24 πŸ˜…
Do i need to specify a path to that default local model? My program seems to be spinning here: #specify a local model
from llama_index import ServiceContext
service_context = ServiceContext.from_defaults(llm='local')


loading

from llama_index import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir=r"C:\Users\erraballiv\PycharmProjects\Ll-index-ex1")
loaded_index = load_index_from_storage(storage_context)
query_engine = loaded_index.as_query_engine()


#Querying
response = query_engine.query("Give me a list of Sub-Processors")
print(response)
print("done")
(Also, Thank you for your help Logan!)
If you have a model downloaded, you can point to it like this:

Plain Text
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt

llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    model_url=None,
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path="path/to/my/model",
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 1},
    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

service_context = ServiceContext.from_defaults(llm=llm)

from llama_index import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir=r"C:\Users\erraballiv\PycharmProjects\Ll-index-ex1")

# NOTE: pass in service context when loading
loaded_index = load_index_from_storage(storage_context, service_context=service_context)
query_engine = loaded_index.as_query_engine()
You mentioned the default local model - I assumed that meant it was pre-downloaded with llama-index - is that innacurate?
Not pre-downloaded. If you set llm="default", it downloads and caches a model when you run (this can take a while, it's like 7GB)
The service context is no longer required I guess?
The modelURL seems to be required for LlamaCPP? (not allowing me to leave that out or set as None).
pydantic.v1.error_wrappers.ValidationError: 1 validation error for LlamaCPP
model_url
none is not an allowed value (type=type_error.none.not_allowed)
yea I just fixed that today LOL
I actually just cut a new release
try updating -- pip install --upgrade llama-index
even though i specified the model_path it looks like that code is downloading that 13b model
Hmm but how is that possible πŸ€” This is the source code

Plain Text
# check if model is cached
if model_path is not None:
    if not os.path.exists(model_path):
        raise ValueError(
            "Provided model path does not exist. "
            "Please check the path or provide a model_url to download."
        )
    else:
        self._model = Llama(model_path=model_path, **model_kwargs)
else:
    cache_dir = get_cache_dir()
    model_name = os.path.basename(model_url)
    model_path = os.path.join(cache_dir, "models", model_name)
    if not os.path.exists(model_path):
        os.makedirs(os.path.dirname(model_path), exist_ok=True)
        self._download_url(model_url, model_path)
        assert os.path.exists(model_path)

    self._model = Llama(model_path=model_path, **model_kwargs)
How did you setup the LLM/service context?
Add a reply
Sign up and join the conversation on Discord