Find answers from the community

Updated last year

llama_index/docs/examples/llm/huggingfac...

At a glance

The community member has finetuned a zephyr7b-alpha-GPTQ model using PEFT and pushed the adapter to Hugging Face. They want to use this model, with the adapter, and play around with RAG. The conventional way to use a Hugging Face hosted model, as described in a linked example, works for the zephyr7b-alpha-GPTQ model, but throws an error when trying to use the hosted adapter.

In the comments, another community member suggests loading the model and tokenizer outside of llama-index, and passing them directly to HuggingFaceLLM. This approach is confirmed to work, as the original poster acknowledges that they had tried to pass the model directly to ServiceContext, which would not work, but did not think to pass it to HuggingFaceLLM directly.

Useful resources
Hopefully not too stupid a question. I have finetuned a zephyr7b-alpha-GPTQ via PEFT, and pushed the adapter to HuggingFace. I would like to use this model, with the adapter, and play around with RAG. The conventional way to use a HF hosted model, described here: https://github.com/run-llama/llama_index/blob/main/docs/examples/llm/huggingface.ipynb would work if I were to just give the zephyr7b-alpha-GPTQ path, but throws an error if I point it at my HF hosted adapter (obvious I guess, since that is just the adapter).
L
m
3 comments
load the model and tokenizer outside of llama-index, and pass it in

Plain Text
model = ...
tokenizer = ...
llm =HuggingFaceLLM(model=model, tokenizer=tokenizer, ...)
ah yes that should do the trick, I attempted to do that and pass the model directly in ServiceContext (which ofc wouldnt work) but for some reason it never dawned on me to pass it in HuggingFaceLLM directly, I thought that only works with paths. Thanks!
Yea I knew this case would come up haha so theres model_name and model kwargs πŸ˜‰
Add a reply
Sign up and join the conversation on Discord