Find answers from the community

Updated 3 months ago

GGML model

Hi guys, I am having problems to run a HuggingFaceLLM with a local model, I want to run the index completely offline. But can't instantiate the tokenizer from .bin file downloaded from internet.
This is my code:
Plain Text
model_path = "./models/llama-2-13b.ggmlv3.q4_0.bin"

(line 30)tokenizer = AutoTokenizer.from_pretrained(model_path)

llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.7, "do_sample": False},
    model=model_path,
    tokenizer=tokenizer,
    device_map="cpu",
)

Plain Text
Traceback (most recent call last):
  File "/home/{path}/example1.py", line 30, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_path)
  File "/home/{path}/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 652, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/{path}/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 496, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/{path}/.venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 417, in cached_file
    resolved_file = hf_hub_download(
  File "/home/{path}/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/{path}/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './models/llama-2-13b.ggmlv3.q4_0.bin'. Use `repo_type` argument if needed.

Sorry if duplicated, I dont found using discord search
G
L
8 comments
hi @Logan M, i found that my problem is that i am trying run a ggml model
https://discord.com/channels/1059199217496772688/1133465372310372363/1133469324527550514 i found that but is not working, llama index is requiring OpenAi api key (is because the model is wrong)?
Plain Text
#Debug logging
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import SimpleDirectoryReader, VectorStoreIndex, ServiceContext
from llama_index.node_parser import SimpleNodeParser
from llama_index.llms import LangChainLLM
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

required_exts = ['.txt']
documents = SimpleDirectoryReader(
    './docs',
    recursive=True,
    required_exts=required_exts,
).load_data()

parser = SimpleNodeParser()

nodes = parser.get_nodes_from_documents(documents)

model_path = "./models/llama-2-7b-chat.ggmlv3.q4_0.bin"

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

logging.debug(f"Loading model from {model_path}")

llm = LlamaCpp(
    model_path=model_path,
    #input={"temperature": 0.75, "max_length": 2000, "top_p": 1},
    callback_manager=callback_manager,
    verbose=True,
)

logging.debug(f"Wrapping LLM with LangChainLLM")

wrapped_llm = LangChainLLM(
    llm=llm,
    callback_manager=callback_manager
)

logging.debug(f"Creating service context")

service_context = ServiceContext.from_defaults(
    #chunk_size=1024, 
    llm=wrapped_llm,
)

logging.debug(f"Creating index")

index = VectorStoreIndex(
    service_context=service_context,
    nodes=nodes,
    show_progress=True,
)

#index.storage_context.persist(persist_dir="./models/conectar/index")

query_engine = index.as_query_engine()
response = query_engine.query("Terciario es un")
print(response)
So there's two models in llama index, an LLM and an embedding model

You've set the LLM, but the emebdding model is still defaulting to openai

You cam figure this also pretty easily
https://gpt-index.readthedocs.io/en/stable/core_modules/model_modules/embeddings/usage_pattern.html#embedding-model-integrations
thank you so much @Logan M !!! It finally works :chuleta:
Now i need to learn how configure properly
thank you again!!
:peepoexcitedhug:
Add a reply
Sign up and join the conversation on Discord