LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

@Logan M I got another one for you. Ill

@Logan M I got another one for you. Ill

At a glance

The community member is having issues using local embeddings with the llama_index library. They encountered an error related to the ONNX runtime and the HuggingFace embedding model. After trying various approaches, including making a pull request to fix the issue, the community member ultimately found a solution by cloning the HuggingFace model locally and using the HuggingFaceEmbedding class without the ONNX optimizations.

Useful resources

·

I got another one for you. Ill make it a thread to keep the server a little clean . but it involved using the local embeddingz

N

L

45 comments

here is the traceback

Traceback (most recent call last):
File "/home/adctme/Downloads/kmcraft/llamadex/start_un.py", line 142, in <module>
vector_index = VectorStoreIndex(base_nodes_2021, service_context=service_context,)
File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 52, in init
super().init(
File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/base.py", line 71, in init
index_struct = self.build_index_from_nodes(nodes)
File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 262, in build_index_from_nodes
return self._build_index_from_nodes(nodes, insert_kwargs) File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 243, in _build_index_from_nodes self._add_nodes_to_index( File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 196, in _add_nodes_to_index nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress) File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 104, in _get_node_with_embedding id_to_embed_map = embed_nodes( File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/utils.py", line 137, in embed_nodes new_embeddings = embed_model.get_text_embedding_batch( File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/embeddings/base.py", line 256, in get_text_embedding_batch embeddings = self._get_text_embeddings(cur_batch) File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/embeddings/huggingface_optimum.py", line 179, in _get_text_embeddings return self._embed(texts) File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/embeddings/huggingface_optimum.py", line 140, in _embed model_output = self._model(encoded_input)

File "/home/adctme/.local/lib/python3.10/site-packages/optimum/modeling_base.py", line 90, in call
return self.forward(*args, **kwargs)
File "/home/adctme/.local/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 958, in forward
outputs = self.model.run(None, onnx_inputs)
File "/home/adctme/.local/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 217, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid Feed Input Name:token_type_ids

here is the embeded model
embed_model = OptimumEmbedding(folder_name="./bge_onnx")
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

Ugh yea, I need to fix that

So annoying lol

(That Huggingface error is annoying*)

lol any suggestion for using local embedding

i am open to doing something different

my only real requirement is that the model needs to run completely offline.

I think I need to make a PR to fix this actually. Huggingface/transformers updated and not it throws this wacky error.

one sec I can make a PR now tbh

https://github.com/run-llama/llama_index/compare/logan/fix_hf_embeddings?expand=1

You can install from source to get the fix 👍

pip install git+https://github.com/run-llama/llama_index.git

I'll give it a go in the morning

Traceback (most recent call last):
File "/home/adctme/Downloads/kmcraft/llamadex/start_un.py", line 142, in <module>
vector_index = VectorStoreIndex(base_nodes_2021, service_context=service_context,)
File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 52, in init
super().init(
File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/base.py", line 72, in init
index_struct = self.build_index_from_nodes(nodes)
File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 262, in build_index_from_nodes
return self._build_index_from_nodes(nodes, insert_kwargs) File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 243, in _build_index_from_nodes self._add_nodes_to_index( File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 196, in _add_nodes_to_index nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress) File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 104, in _get_node_with_embedding id_to_embed_map = embed_nodes( File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/indices/utils.py", line 137, in embed_nodes new_embeddings = embed_model.get_text_embedding_batch( File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/core/embeddings/base.py", line 256, in get_text_embedding_batch embeddings = self._get_text_embeddings(cur_batch) File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/embeddings/huggingface_optimum.py", line 183, in _get_text_embeddings return self._embed(texts) File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/embeddings/huggingface_optimum.py", line 144, in _embed model_output = self._model(encoded_input)

File "/home/adctme/.local/lib/python3.10/site-packages/optimum/modeling_base.py", line 90, in call
return self.forward(*args, **kwargs)
File "/home/adctme/.local/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 960, in forward
last_hidden_state = outputs[self.output_names["last_hidden_state"]]
KeyError: 'last_hidden_state'

lol ok, I actually ran into this the other day

And i couldn't get onnx to work with bge

even though it worked in the past

I tried a million things too

I think either transformers or optimizers updated their package version, and caused some issues (if I had to guess)

do other embedders work ?

unsure -- are you just trying to use onnx for speedups?

maybe use fastembed to text-embedding-interface instead. Or you can use the base HuggingFaceEmbedding class

I need to sink more time into debugging onnx, just haven't got around to it yet

I was using onnx because it allowed the embedding to be run locally.

HuggingFaceEmbedding does as well

in fact, it's the exact same as what you are currently doing, but without the onnx optimizations

ok so i dont have a problem using the huggingface embedding but it doesnt download that one locally

but it doesnt download that one locally what do you mean? HuggingFaceEmbedding will download the model weights and run the model locally

I think it downloads it everytime. When i look in the cache for the bge weights it is empty

🤔 It downloads only once for me.

For example

Plain Text

>>> from llama_index.embeddings import HuggingFaceEmbedding
>>> embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

Does not download, because I've already downloaded this model before to the default cache_folder.

If I change the cache folder, it will download again. But subsequent calls will not re-download

Plain Text

>>> embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", cache_folder="./bge_cache")
config.json: 100%|█████| 743/743 [00:00<00:00, 6.44MB/s]
model.safetensors: 100%|█████| 133M/133M [00:03<00:00, 33.5MB/s]
tokenizer_config.json: 100%|█████| 366/366 [00:00<00:00, 2.60MB/s]
vocab.txt: 100%|█████| 232k/232k [00:00<00:00, 2.35MB/s]
tokenizer.json: 100%|█████| 711k/711k [00:00<00:00, 10.4MB/s]
special_tokens_map.json: 100%|█████| 125/125 [00:00<00:00, 1.01MB/s]
>>> embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", cache_folder="./bge_cache")
>>>

ok so it did download. but here is the problem. THere is still a connection that is hitting the net.

time with an internet connection real 0m21.666s

vs time without internet real 1m41.275s

ANONYMIZED_TELEMETRY=False OPT_OUT_CAPTURING=true SCARF_NO_ANALYTICS=true DO_NOT_TRACK=true
These are the flags i am running with it

---------------------Here is the traceback from where the request is --------------
File "/home/adctme/Downloads/kmcraft/llamadex/start_un.py", line 75, in <module>
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", cache_folder="./bge_cache")
File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/embeddings/huggingface.py", line 82, in init
model = AutoModel.from_pretrained(
File "/home/adctme/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
resolved_config_file = cached_file(
File "/home/adctme/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 389, in cached_file
resolved_file = hf_hub_download(
File "/home/adctme/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(args, **kwargs) File "/home/adctme/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download metadata = get_hf_file_metadata( File "/home/adctme/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(args, kwargs) File "/home/adctme/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata r = _request_wrapper( File "/home/adctme/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper response = _request_wrapper( File "/home/adctme/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 408, in _request_wrapper response = get_session().request(method=method, url=url, params)

File "/home/adctme/.local/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, send_kwargs) File "/home/adctme/.local/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, kwargs)
File "/home/adctme/.local/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 67, in send
return super().send(request, *args, **kwargs)
File "/home/adctme/.local/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/home/adctme/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
httplib_response = self._make_request(
File "/home/adctme/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 404, in _make_request
self._validate_conn(conn)
File "/home/adctme/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn
conn.connect()
File "/home/adctme/.local/lib/python3.10/site-packages/urllib3/connection.py", line 363, in connect
self.sock = conn = self._new_conn()
File "/home/adctme/.local/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/home/adctme/.local/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
KeyboardInterrupt

hmm weird, it's hitting the hub in order to validate that the cahced file is good. Interesting

ok option 2

Plain Text

git lfs install
git clone https://huggingface.co/BAAI/bge-small-en-v1.5

Then

Plain Text

>>> from llama_index.embeddings import HuggingFaceEmbedding
>>> embed_model = HuggingFaceEmbedding(model_name="./bge-small-en-v1.5/", tokenizer_name="./bge-small-en-v1.5/")

If it pings HF hub for that, that's a load of 💩 lol

LOL ok i will give it a go

nailed it

Niceeee

Add a reply

Sign up and join the conversation on Discord