Find answers from the community

Updated 3 months ago

Hi all - I'm trying to make the simplest

Hi all - I'm trying to make the simplest possible RAG pipeline calling a local model. If I simply use 'local' for model name, I get back expected results from the model query, but if I hardcode the model name to point at my local '/ai/Mistral-7B-v0.1' directory, I get:

/site-packages/transformers/tokenization_utils_base.py", line 2707, in _get_padding_truncation_strategies raise ValueError(ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'})

The code is:
local_model = '/ai/Mistral-7B-v0.1' llm = HuggingFaceLLM(model_name=local_model) embed_model = HuggingFaceEmbedding(model_name=local_model, tokenizer_name=local_model) chroma_client = chromadb.PersistentClient() chroma_collection = chroma_client.create_collection("quickstart") vector_store = ChromaVectorStore(chroma_collection=chroma_collection) storage_context = StorageContext.from_defaults(vector_store=vector_store) service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model) documents = SimpleDirectoryReader("data").load_data() VectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)
L
S
A
19 comments
I think you need to do what the error is suggesting.

Load the tokenizer outside of llama-index, configure the pad token, then pass it into the LLM class

Plain Text
tokenizer = <your tokenizer>
<configure pad token>

llm = HuggingFaceLLM(tokenizer=tokenizer, ...)
embed_model = HuggingFaceEmbedding(tokenizer=tokenizer, ...)


although I noticed you are using mistral for both the LLM and embeddings. I would expect that to perform very badly (if it even works) πŸ˜… Use BGE or something more performant for embeddings
Thanks! If I set embed_model to 'local' or 'local:/ai/bge-small-en-v1.5', I get this when I call query_engine.query():
File "...site-packages/torch/nn/functional.py", line 2233, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: index out of range in self
πŸ€” what version of llama-index do you have? For me, doing embed_model="local" or embed_model="BAAI/bge-small-en-v1.5" works πŸ€”
Just installed 0.9.21 today with pip install.

Could you post for reference a full script querying a local non-default model? This is my first time using llama-index, I'm probably doing something dumb.
I think this error is likely the LLM, not the embeddings, now that I look at this again
Do you mean the error is specifically the Mistral model? I'll be happy with any example using any non-default model to start with.
Yea -- I think it's becuse at some point an input got too large πŸ€”
Try setting the context_window in the service context to be a bit lower, is one remedy
Going even as low as context_window=512 results in the same error
mmm hard to say without debugging myself. I would say follow that colab notebook and see what happens
Theres a similar notebook for mistral too
Thank you, this example works for me. If I then switch to my document, I get back junk results, but at least the call no longer crashes, which is progress.
sorry for jumping into this. I followed the same notebook, but I want a chatbot that could work on my own end(uni project), and after quantizing i do not see a way to store that quantized model locally :(, I read the document and there is no viable command, would you happen to know what should I do? Thank you in advance!πŸ₯²
I'm assuming you can save/load it the same as any other huggingface model?

model.save_pretrained("./path/to/save") ?
That is what I thought, but I just get a prompt that HuggingFaceLLM doesn't have save_pretrained function 😦
Hey Simon, I was wondering how did you load the local LLM? Since I downloaded a quantized model and as I try to load it by local_model = "/zephyr-7B-beta-GPTQ"

llm = HuggingFaceLLM(
model_name=local_model I end up getting error
PackageNotFoundError: No package metadata was found for auto-gptq
Add a reply
Sign up and join the conversation on Discord