model_path
: This is the path to a pre-downloaded model. You've set it to model_url
, which should be a string containing the URL of the model you want to use. If you have a local model you want to use, you should provide the path to that model here.temperature
: This parameter controls the randomness of the model's output. A temperature of 0.0 will make the output deterministic, while higher values will make it more random. You've set it to 0.0, which means the output will be deterministic.max_new_tokens
: This is the maximum number of new tokens that the model will generate. You've set it to 2048.context_window
: This is the number of tokens that the model will consider when generating new tokens. You've set it to 4096.generate_kwargs
and model_kwargs
: These are dictionaries of additional arguments to pass to the model's __call__()
and __init__()
methods, respectively. You've set generate_kwargs
to an empty dictionary and model_kwargs
to {"n_gpu_layers": -1}
, which means you want to use all available GPU layers.messages_to_prompt
and completion_to_prompt
: These are functions that format the model's inputs. You've set them to messages_to_prompt
and completion_to_prompt
, respectively.verbose
: This controls whether the model will print detailed logs. You've set it to False
, which means it won't print logs.llm = LlamaCPP( model_path=model_url, temperature=0.0, max_new_tokens=2048, context_window=4096, generate_kwargs={}, model_kwargs={"n_gpu_layers": -1}, messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, verbose=False, )
model_url = "https://huggingface.co/wenqiglantz/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf" llm = LlamaCPP( model_path=model_url, temperature=0.0, max_new_tokens=2048, context_window=4096, generate_kwargs={}, model_kwargs={"n_gpu_layers": -1}, messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, verbose=False, )
ValueError Traceback (most recent call last) <ipython-input-27-55640f8389ad> in <cell line: 32>() 30 31 # Initialize the LLM (e.g., LlamaCPP with your model) ---> 32 llm = LlamaCPP( 33 model_path=model_url, 34 temperature=0.0, /usr/local/lib/python3.10/dist-packages/llama_index/llms/llama_cpp/base.py in __init__(self, model_url, model_path, temperature, max_new_tokens, context_window, callback_manager, generate_kwargs, model_kwargs, verbose, system_prompt, messages_to_prompt, completion_to_prompt, pydantic_program_mode, output_parser) 102 if model_path is not None: 103 if not os.path.exists(model_path): --> 104 raise ValueError( 105 "Provided model path does not exist. " 106 "Please check the path or provide a model_url to download." ValueError: Provided model path does not exist. Please check the path or provide a model_url to download.
model_path
to model_url
, which is a URL, not a local file path.model_path
parameter should be used if you have a pre-downloaded model on your local machine that you want to use. If you want to download a model from a URL, you should use the model_url
parameter instead.model_url = "<https://huggingface.co/wenqiglantz/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf"> llm = LlamaCPP( model_url=model_url, temperature=0.0, max_new_tokens=2048, context_window=4096, generate_kwargs={}, model_kwargs={"n_gpu_layers": -1}, messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, verbose=False, )
model_path
with model_url
. This tells LlamaCPP to download the model from the provided URL.UnsupportedOperation Traceback (most recent call last) <ipython-input-30-06903de9c8be> in <cell line: 32>() 30 31 # Initialize the LLM (e.g., LlamaCPP with your model) ---> 32 llm = LlamaCPP( 33 model_url=model_url, 34 temperature=0.0, 2 frames /usr/local/lib/python3.10/dist-packages/llama_cpp/_utils.py in __enter__(self) 27 return self # Return the instance without making changes 28 ---> 29 self.old_stdout_fileno_undup = self.sys.stdout.fileno() 30 self.old_stderr_fileno_undup = self.sys.stderr.fileno() 31 UnsupportedOperation: fileno
UnsupportedOperation: fileno
typically occurs when you're trying to redirect the standard output (stdout) or standard error (stderr) in an environment that doesn't support file descriptors, such as Jupyter notebooks or certain IDEs.ValueError Traceback (most recent call last) <ipython-input-34-df89ae623e7e> in <cell line: 56>() 54 # Now you can use this custom query engine to perform queries 55 query_str = "What is the so called inspector view in spotlight?" ---> 56 response = original_query_engine.custom_query(query_str) 57 print(response) 16 frames /usr/local/lib/python3.10/dist-packages/llama_index/core/base/embeddings/base.py in similarity(embedding1, embedding2, mode) 54 return np.dot(embedding1, embedding2) 55 else: ---> 56 product = np.dot(embedding1, embedding2) 57 norm = np.linalg.norm(embedding1) * np.linalg.norm(embedding2) 58 return product / norm ValueError: shapes (1536,) and (384,) not aligned: 1536 (dim 0) != 384 (dim 0)
ValueError: shapes (1536,) and (384,) not aligned: 1536 (dim 0) != 384 (dim 0)
is indicating that there's a mismatch in the dimensions of the two arrays you're trying to perform a dot product on.embedding1
has a shape of (1536,)
and embedding2
has a shape of (384,)
. The dot product operation requires that the inner dimensions of the two arrays match, which is not the case here.embedding1
and embedding2
were generated, it's difficult to provide a more specific solution. You might want to refer back to the LlamaIndex documentation () or the specific code that generated the embeddings for more information.PERSIST_DIR = "storage" if not os.path.exists(f'{PERSIST_DIR}/docstore.json'): # load the documents and create the index documents = SimpleDirectoryReader("data/paul_graham").load_data() index = VectorStoreIndex.from_documents(documents, embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")) # store it for later index.storage_context.persist(persist_dir=PERSIST_DIR) else: # load the existing index storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR) index = load_index_from_storage(storage_context) # Assuming the retriever and response synthesizer are set up as follows: retriever = index.as_retriever(similarity_top_k=3, chunk_size=256, chunk_overlap=64) # Initialize the LLM (e.g., LlamaCPP with your model) llm = LlamaCPP( model_url=model_url, temperature=0.0, max_new_tokens=2048, context_window=4096, generate_kwargs={}, model_kwargs={"n_gpu_layers": -1}, messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, verbose=False, ) response_synthesizer = get_response_synthesizer(llm=llm) # Instantiate the OriginalQueryEngine with the retriever and response synthesizer original_query_engine = NewQueryEngine( llm=llm, glossary_path="./glossary.json", retriever=retriever, response_synthesizer=response_synthesizer, ) # Now you can use this custom query engine to perform queries query_str = "What is the so called inspector view in spotlight?" response = original_query_engine.custom_query(query_str) print(response)
from llama_index.core import Settings Settings.llm= llm Settings.embed_model= embed_model
os.environ["OPENAI_API_KEY"] = "
# Initialize the embedding model embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5") Settings.embed_model = embed_model
index = VectorStoreIndex.from_documents(documents, embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"))
import os from llama_index.core import SimpleDirectoryReader, VectorStoreIndex,StorageContext,load_index_from_storage from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.llms.llama_cpp import LlamaCPP from llama_index.llms.llama_cpp.llama_utils import ( messages_to_prompt, completion_to_prompt, ) from llama_index.core import Settings os.environ["OPENAI_API_KEY"] = "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" # Initialize the embedding model embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5") # Initialize the LLM (e.g., LlamaCPP with your model) llm = LlamaCPP( model_url=model_url, temperature=0.0, max_new_tokens=2048, context_window=4096, generate_kwargs={}, model_kwargs={"n_gpu_layers": -1}, messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, verbose=True, ) # With this only this embed model and llm will be used everywhere Settings.embed_model = embed_model Settings.llm=llm # Now proceed from here