GGML model

At a glance

The community member is having trouble running a HuggingFaceLLM with a local model and cannot instantiate the tokenizer from a .bin file downloaded from the internet. They provide their code, which results in an error. Another community member suggests that the issue is related to the GGML model format, and provides a link to a Discord discussion. The community member then shares additional code that uses the LlamaIndex library to load a local GGML model, but still encounters issues with the embedding model. Another community member suggests that the embedding model is still defaulting to OpenAI, and provides a link to documentation on how to configure the embedding model. The original community member thanks the others for their help, and indicates that the issue has been resolved.

Useful resources

GGianluca

Hi guys, I am having problems to run a HuggingFaceLLM with a local model, I want to run the index completely offline. But can't instantiate the tokenizer from .bin file downloaded from internet.
This is my code:

Plain Text

model_path = "./models/llama-2-13b.ggmlv3.q4_0.bin"

(line 30)tokenizer = AutoTokenizer.from_pretrained(model_path)

llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.7, "do_sample": False},
    model=model_path,
    tokenizer=tokenizer,
    device_map="cpu",
)

Plain Text

Traceback (most recent call last):
  File "/home/{path}/example1.py", line 30, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_path)
  File "/home/{path}/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 652, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/{path}/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 496, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/{path}/.venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 417, in cached_file
    resolved_file = hf_hub_download(
  File "/home/{path}/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/{path}/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './models/llama-2-13b.ggmlv3.q4_0.bin'. Use `repo_type` argument if needed.

Sorry if duplicated, I dont found using discord search

8 comments

GGianluca

hi @Logan M, i found that my problem is that i am trying run a ggml model

GGianluca

https://discord.com/channels/1059199217496772688/1133465372310372363/1133469324527550514 i found that but is not working, llama index is requiring OpenAi api key (is because the model is wrong)?

GGianluca

Plain Text

#Debug logging
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import SimpleDirectoryReader, VectorStoreIndex, ServiceContext
from llama_index.node_parser import SimpleNodeParser
from llama_index.llms import LangChainLLM
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

required_exts = ['.txt']
documents = SimpleDirectoryReader(
    './docs',
    recursive=True,
    required_exts=required_exts,
).load_data()

parser = SimpleNodeParser()

nodes = parser.get_nodes_from_documents(documents)

model_path = "./models/llama-2-7b-chat.ggmlv3.q4_0.bin"

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

logging.debug(f"Loading model from {model_path}")

llm = LlamaCpp(
    model_path=model_path,
    #input={"temperature": 0.75, "max_length": 2000, "top_p": 1},
    callback_manager=callback_manager,
    verbose=True,
)

logging.debug(f"Wrapping LLM with LangChainLLM")

wrapped_llm = LangChainLLM(
    llm=llm,
    callback_manager=callback_manager
)

logging.debug(f"Creating service context")

service_context = ServiceContext.from_defaults(
    #chunk_size=1024, 
    llm=wrapped_llm,
)

logging.debug(f"Creating index")

index = VectorStoreIndex(
    service_context=service_context,
    nodes=nodes,
    show_progress=True,
)

#index.storage_context.persist(persist_dir="./models/conectar/index")

query_engine = index.as_query_engine()
response = query_engine.query("Terciario es un")
print(response)

LLogan M

So there's two models in llama index, an LLM and an embedding model

You've set the LLM, but the emebdding model is still defaulting to openai

You cam figure this also pretty easily
https://gpt-index.readthedocs.io/en/stable/core_modules/model_modules/embeddings/usage_pattern.html#embedding-model-integrations

GGianluca

thank you so much @Logan M !!! It finally works :chuleta:

GGianluca

Now i need to learn how configure properly

GGianluca

thank you again!!

GGianluca

:peepoexcitedhug:

Add a reply

Find answers from the community

GGML model