it is downloading another model for some

At a glance

it is downloading another model for some reason... could it be that the Llama2 model I converted with llama.cpp wasn't compatable? also after it spits out a lot of llm stat info I noticed it saying "Could not load OpenAIEmbedding. Using HuggingFaceBgeEmbeddings with model_name=BAAI/bge-small-en. If you intended to use OpenAI, please check your OPENAI_API_KEY." then after downloading tokenizer and other stuff from huggingface it said "Could not load OpenAI model. Using default LlamaCPP=llama2-13b-chat. If you intended to use OpenAI, please check your OPENAI_API_KEY." then started downloading the new model from huggingface. ... 🤦‍♂️ does that sound about rite? lol

15 comments

LLogan M

If you don't have an openai key set, it falls back to BAAI/bge-small-en for embeddings model and llama2-chat-13b for the LLM

You need to properly setup your service context to avoid this, or set your openai key

MM00nshine

im trying to run a local llm lol

LLogan M

How did you setup your service context?

LLogan M

Plain Text

query_engine = index.as_query_engine(service_context=service_context) ... that?
im sorry im still learning python and i was trying to follow the docs here https://docs.llamaindex.ai/en/stable/getting_started/customization.html

Hmm, Try passing it into the index creation instead. Or just set a global tbh

Plain Text

from llama_index import set_global_service_context

set_global_service_context(service_context)

MM00nshine

dangit wrong window ..

MM00nshine

query_engine = index.as_query_engine(service_context=service_context) ... that?

LLogan M

Yea close, it should probably just be a global instead, or pass it into the index creation

Plain Text

from llama_index import set_global_service_context

set_global_service_context(service_context)

# OR
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

MM00nshine

this is my .py

MM00nshine

import os
from langchain.llms import LlamaCpp
from llama_index import ServiceContext
from llama_index import VectorStoreIndex, SimpleDirectoryReader, LLMPredictor

Set up the LLM model options

llm = LlamaCpp(
model_path="models/bob-34b/ggml-model-q4_0.gguf",
temperature=0.1,
max_tokens=2000,
top_p=1,
verbose=True,
)

llm_predictor = LLMPredictor(llm=llm)