Find answers from the community

Updated last month

Error Loading Vocabulary from Local HF Embed Model File

At a glance

A community member is using Notebook to convert a HuggingFace embed model into ONNX format. The model works fine in Notebook, but when the community member downloads the files to their local machine, they encounter an error: "OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted."

In the comments, another community member suggests that the original poster is missing some files. The second community member notes that the vocab.txt file in the BGE model is replaced by model.onnx_json in the snowflake model, but the snowflake model runs fine in the same Colab environment where it was generated. However, when the second community member downloads all 6 files manually to their local machine, they encounter the same error as the original poster.

The issue is resolved when the second community member realizes they were missing the tokenizer.json file.

Useful resources
Using Notebook (https://docs.llamaindex.ai/en/stable/examples/embeddings/huggingface/) to convert HF embed model into ONNX. Produced model works fine in Notebook but when I download the files in local machine and try to run; it throws below error: "OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted."
Attachment
image.png
L
J
3 comments
Seems you are missing some files
I can see vocab.txt file in BGE model get replaced by model.onnx_json in snowflake model. But snowflakes runs fine(below pic) as long as i run it in same Colab where i generated it but the moment i download all 6 files manually into my local machine it throws "OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted."
Attachment
image.png
It's working now. Turns out i was missing tokenizer.json
Attachment
image.png
Add a reply
Sign up and join the conversation on Discord