llm = LlamaCPP( model_path=model

aandysingal

@kapa.ai llm = LlamaCPP(
model_path=model_url,
temperature=0.0,
max_new_tokens=2048,
context_window=4096,
generate_kwargs={},
model_kwargs={"n_gpu_layers": -1},
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=False,
)

27 comments

kkapa.ai

Please stand by @andysingal as I search the knowledge base. I normally take about 20 seconds or so. :wave:

kkapa.ai

----------------------
The code you've provided is a configuration for the LlamaCPP model in LlamaIndex. Here's a breakdown of what each parameter does:

model_path: This is the path to a pre-downloaded model. You've set it to model_url, which should be a string containing the URL of the model you want to use. If you have a local model you want to use, you should provide the path to that model here.

temperature: This parameter controls the randomness of the model's output. A temperature of 0.0 will make the output deterministic, while higher values will make it more random. You've set it to 0.0, which means the output will be deterministic.

max_new_tokens: This is the maximum number of new tokens that the model will generate. You've set it to 2048.

context_window: This is the number of tokens that the model will consider when generating new tokens. You've set it to 4096.

generate_kwargs and model_kwargs: These are dictionaries of additional arguments to pass to the model's __call__() and __init__() methods, respectively. You've set generate_kwargs to an empty dictionary and model_kwargs to {"n_gpu_layers": -1}, which means you want to use all available GPU layers.

messages_to_prompt and completion_to_prompt: These are functions that format the model's inputs. You've set them to messages_to_prompt and completion_to_prompt, respectively.

verbose: This controls whether the model will print detailed logs. You've set it to False, which means it won't print logs.

Here's the code you've provided, for reference:

Plain Text

llm = LlamaCPP(
    model_path=model_url,
    temperature=0.0,
    max_new_tokens=2048,
    context_window=4096,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": -1},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=False,
)

kkapa.ai

For more information, you can refer to the LlamaCPP documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

aandysingal

@kapa.ai

Plain Text

model_url = "https://huggingface.co/wenqiglantz/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf" llm = LlamaCPP(
    model_path=model_url,
    temperature=0.0,
    max_new_tokens=2048,
    context_window=4096,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": -1},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=False,
)

gives error

Plain Text

ValueError                                Traceback (most recent call last)
<ipython-input-27-55640f8389ad> in <cell line: 32>()
     30 
     31 # Initialize the LLM (e.g., LlamaCPP with your model)
---> 32 llm = LlamaCPP(
     33     model_path=model_url,
     34     temperature=0.0,

/usr/local/lib/python3.10/dist-packages/llama_index/llms/llama_cpp/base.py in __init__(self, model_url, model_path, temperature, max_new_tokens, context_window, callback_manager, generate_kwargs, model_kwargs, verbose, system_prompt, messages_to_prompt, completion_to_prompt, pydantic_program_mode, output_parser)
    102         if model_path is not None:
    103             if not os.path.exists(model_path):
--> 104                 raise ValueError(
    105                     "Provided model path does not exist. "
    106                     "Please check the path or provide a model_url to download."

ValueError: Provided model path does not exist. Please check the path or provide a model_url to download.

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
The error message you're seeing indicates that the model path you've provided does not exist. In your case, you've set model_path to model_url, which is a URL, not a local file path.

The model_path parameter should be used if you have a pre-downloaded model on your local machine that you want to use. If you want to download a model from a URL, you should use the model_url parameter instead.

Here's how you can modify your code:

Plain Text

model_url = "<https://huggingface.co/wenqiglantz/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf">

llm = LlamaCPP(
    model_url=model_url,
    temperature=0.0,
    max_new_tokens=2048,
    context_window=4096,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": -1},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=False,
)

In this modified code, I've replaced model_path with model_url. This tells LlamaCPP to download the model from the provided URL.

For more information, you can refer to the LlamaCPP documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

aandysingal

@kapa.ai got error

Plain Text

UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-30-06903de9c8be> in <cell line: 32>()
     30 
     31 # Initialize the LLM (e.g., LlamaCPP with your model)
---> 32 llm = LlamaCPP(
     33     model_url=model_url,
     34     temperature=0.0,

2 frames
/usr/local/lib/python3.10/dist-packages/llama_cpp/_utils.py in __enter__(self)
     27             return self  # Return the instance without making changes
     28 
---> 29         self.old_stdout_fileno_undup = self.sys.stdout.fileno()
     30         self.old_stderr_fileno_undup = self.sys.stderr.fileno()
     31 

UnsupportedOperation: fileno

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
The error message UnsupportedOperation: fileno typically occurs when you're trying to redirect the standard output (stdout) or standard error (stderr) in an environment that doesn't support file descriptors, such as Jupyter notebooks or certain IDEs.

In your case, it seems that the LlamaCPP model is trying to redirect stdout or stderr, which is causing the error.

Unfortunately, without modifying the LlamaCPP code or running your code in a different environment (like a Python script run from the command line), there's not much you can do to resolve this error.

You might want to reach out to the maintainers of LlamaIndex for further assistance. You can find more information and contact details in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

aandysingal

@kapa.ai

Plain Text

ValueError                                Traceback (most recent call last)
<ipython-input-34-df89ae623e7e> in <cell line: 56>()
     54 # Now you can use this custom query engine to perform queries
     55 query_str = "What is the so called inspector view in spotlight?"
---> 56 response = original_query_engine.custom_query(query_str)
     57 print(response)

16 frames
/usr/local/lib/python3.10/dist-packages/llama_index/core/base/embeddings/base.py in similarity(embedding1, embedding2, mode)
     54         return np.dot(embedding1, embedding2)
     55     else:
---> 56         product = np.dot(embedding1, embedding2)
     57         norm = np.linalg.norm(embedding1) * np.linalg.norm(embedding2)
     58         return product / norm

ValueError: shapes (1536,) and (384,) not aligned: 1536 (dim 0) != 384 (dim 0)

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
The error message ValueError: shapes (1536,) and (384,) not aligned: 1536 (dim 0) != 384 (dim 0) is indicating that there's a mismatch in the dimensions of the two arrays you're trying to perform a dot product on.

In your case, it seems that embedding1 has a shape of (1536,) and embedding2 has a shape of (384,). The dot product operation requires that the inner dimensions of the two arrays match, which is not the case here.

This error typically occurs when you're trying to perform operations on arrays that have incompatible shapes. In the context of embeddings, this could mean that the embeddings you're trying to compare were generated by different models or with different settings, resulting in different dimensionalities.

To resolve this issue, you'll need to ensure that the embeddings you're comparing have the same dimensions. This might involve re-generating the embeddings with the same model or settings, or potentially resizing the embeddings to match.

Unfortunately, without more information about how embedding1 and embedding2 were generated, it's difficult to provide a more specific solution. You might want to refer back to the LlamaIndex documentation () or the specific code that generated the embeddings for more information.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

WWhiteFang_Jr

This means that your embedding models are not the same. the one used currently as well the one used to create indexing are different

aandysingal

just using one:

Plain Text

PERSIST_DIR = "storage"
if not os.path.exists(f'{PERSIST_DIR}/docstore.json'):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data/paul_graham").load_data()
    index = VectorStoreIndex.from_documents(documents, embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"))
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# Assuming the retriever and response synthesizer are set up as follows:
retriever = index.as_retriever(similarity_top_k=3,  chunk_size=256, chunk_overlap=64)

# Initialize the LLM (e.g., LlamaCPP with your model)
llm = LlamaCPP(
    model_url=model_url,
    temperature=0.0,
    max_new_tokens=2048,
    context_window=4096,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": -1},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=False,
)

response_synthesizer = get_response_synthesizer(llm=llm)

# Instantiate the OriginalQueryEngine with the retriever and response synthesizer
original_query_engine = NewQueryEngine(
    llm=llm,
    glossary_path="./glossary.json",
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

# Now you can use this custom query engine to perform queries
query_str = "What is the so called inspector view in spotlight?"
response = original_query_engine.custom_query(query_str)
print(response)

WWhiteFang_Jr

Do you have openai key in your notebook?

WWhiteFang_Jr

It seems like that though. I would suggest add the llm and enbed_model to Settings as well and re-index your docs

Plain Text

from llama_index.core import Settings
Settings.llm= llm
Settings.embed_model= embed_model

aandysingal

oh yes, i have it:

Plain Text

os.environ["OPENAI_API_KEY"] = "

where to add Settings.llm in the above code?

WWhiteFang_Jr

Before checking for the persisting storage if else condition

WWhiteFang_Jr

That's why, see in your case, while creating the index you dont have anything defined in Settings so it defaulted to OpenAI and you already had the key defined so code used it created the indexing using openai.

aandysingal

i can try

WWhiteFang_Jr

These should be just after you define your embed model

Plain Text

# Initialize the embedding model
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.embed_model = embed_model

WWhiteFang_Jr

same goes for llm

aandysingal

how to add within?

Plain Text

index = VectorStoreIndex.from_documents(documents, embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"))

WWhiteFang_Jr

No need to add if you define it with Settings

WWhiteFang_Jr

Plain Text

import os
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex,StorageContext,load_index_from_storage
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import (
    messages_to_prompt,
    completion_to_prompt,
)
from llama_index.core import Settings


os.environ["OPENAI_API_KEY"] = "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"

# Initialize the embedding model
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Initialize the LLM (e.g., LlamaCPP with your model)
llm = LlamaCPP(
    model_url=model_url,
    temperature=0.0,
    max_new_tokens=2048,
    context_window=4096,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": -1},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

# With this only this embed model and llm will be used everywhere
Settings.embed_model = embed_model
Settings.llm=llm

# Now proceed from here

WWhiteFang_Jr

You can use the above code

aandysingal

Thank you very much

Add a reply

Find answers from the community

llm = LlamaCPP( model_path=model_url, temperature=0.0, max_new_tokens=2048,