Ah I see -- local embeddings will just autmatically truncate
I'm surprised that TEI doesnt do the same π€
i don't see anything in these readmes about token input length
TEI has a "max batch tokens" setting but the default is a very large number
It is related to the model though, BGE has a max input size of 512 tokens
But I would expect it to just trunacte like normal embddings :PSadge:
ah it has a truncate option!
looks like truncate is set on the request
if you do I can test it real quick
Yea give me a few to figure out the change and merge it π
hmm trying to test with longer inputs, and I'm getting Error 413 - Payload too large
lol
embeddings = embed_model.get_text_embedding("Hello World! " * 512)
well, LMK if you need anything on my end
I might try spinning up my own TEI server, it kind of sounds like a server config maybe?
i shared the CLI options earlier. happy to make whatever change you suggest, but the CLI doesn't have many options for launching the server
yea not really sure how to configure this. Tbh I'm wondering how you didn't get the same error
Like, requests still work for smaller texts as before, but whenever I tried to test the truncating ability I got that error above π€
But since it seemed to work for you, I'll consider that a "me" problem
can you remind me the pip syntax to install from your fork/branch?
do i have to set truncate anywhere or are you doing that buried?
I got the same error at the very end
Traceback (most recent call last):
File "/home/thoraxe/.pyenv/versions/3.9.16/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/thoraxe/.pyenv/versions/3.9.16/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/thoraxe/Red_Hat/openshift/llamaindex-experiments/fastapi-lightspeed-service/tools/indexer.py", line 39, in <module>
product_index = VectorStoreIndex.from_documents(product_documents, storage_context=storage_context, service_context=service_context, show_progress=True)
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 102, in from_documents
return cls(
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 49, in __init__
super().__init__(
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 71, in __init__
index_struct = self.build_index_from_nodes(nodes)
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 254, in build_index_from_nodes
return self._build_index_from_nodes(nodes, **insert_kwargs)
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 235, in _build_index_from_nodes
self._add_nodes_to_index(
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 188, in _add_nodes_to_index
nodes = self._get_node_with_embedding(nodes, show_progress)
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 106, in _get_node_with_embedding
embedding = id_to_embed_map[node.node_id]
KeyError: 'f2e7c36b-fd8c-4562-b366-a6012b3c06bf'
you don't actually persist the docs
I added this code to your example file and got the same error:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.storage.storage_context import StorageContext
product_documents = SimpleDirectoryReader('data/ocp-product-docs-md').load_data()
storage_context = StorageContext.from_defaults()
product_index = VectorStoreIndex.from_documents(product_documents, storage_context=storage_context, service_context=service_context, show_progress=True)
I can send you these two markdown files
they're just public openshift documentation
still seeing the same problem in the logs of TEI:
Input validation error: `inputs` must have less than 512 tokens. Given: 590
looks like maybe a pip problem
the progress report shows a lot of:
Generating embeddings: 45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 202/445 [00:01<00:02, 98.52it/s]<Response [413 Payload Too Large]>
and i get the same error at the end
ok, FWIW I switched to using only the paul graham essay, and got the same error
ok now you are getting the same <Response [413 Payload Too Large]> as me
I get this first:
Parsing documents into nodes: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 18.34it/s]
Generating embeddings: 0%| | 0/19 [00:00<?, ?it/s]<Response [413 Payload Too Large]>
Generating embeddings: 53%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 10/19 [00:00<00:00, 75.84it/s]<Response [413 Payload Too Large]>
Generating embeddings: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 19/19 [00:00<00:00, 81.28it/s]
then I get the error:
Traceback (most recent call last):
File "/home/thoraxe/Red_Hat/openshift/llamaindex-experiments/fastapi-lightspeed-service/tmptest_embedder.py", line 32, in <module>
product_index = VectorStoreIndex.from_documents(product_documents, storage_context=storage_context, service_context=service_context, show_progress=True)
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 102, in from_documents
return cls(
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 49, in __init__
super().__init__(
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 71, in __init__
index_struct = self.build_index_from_nodes(nodes)
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 254, in build_index_from_nodes
return self._build_index_from_nodes(nodes, **insert_kwargs)
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 235, in _build_index_from_nodes
self._add_nodes_to_index(
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 188, in _add_nodes_to_index
nodes = self._get_node_with_embedding(nodes, show_progress)
File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 106, in _get_node_with_embedding
embedding = id_to_embed_map[node.node_id]
KeyError: '8576266a-2f97-48da-b62e-d676ebabf473'
yea the second error is related to the embeddings failing π
so I'm not sure this error is related to the TEI truncation
dunno what to tell you π¬
imma open a github issue on TEI
you could tweak the code to use my loader and the graham example and throw that in a gist and you have a reproducer
Yea I was already hitting this locally
but I thought it was just a "me" problem
there's no way i'm sending more than 2MB of data
the whole f'n file is not 2mb
are you going to file a new issue or pile onto that existing one?
but going try one last test with my own docker
import httpx
headers = {"Content-Type": "application/json"}
json_data = {"inputs": "Hello World! " * 512, "truncate": True}
with httpx.Client() as client:
response = client.post(
"http://127.0.0.1:8080/embed",
headers=headers,
json=json_data,
timeout=60,
)
data = response.json()
ok going to try with your URL now
hmm that works too... what the π
Somehow I can't reproduce using the raw API like that.
But using the actual embeddings class, it works for me locally, but not for your server
That above link works fine for my local docker deployment
but fails when I switch the URL to your server
how did you launch the TEI server locally?
can you share the exact command you ran?
because a) i'd want to test that locally and b) if you don't have a GPU, maybe something is borked lower-down
let me see if I can find the exact command. Since I launched it a long time ago I can just re-launch from the docker GUI lol
docker run -p 8080:80 -v "C:\Users\logan\Downloads\embeddings_data" --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id "BAAI/bge-large-en-v1.5" --revision "refs/pr/5"
I only have an AMD GPU, so no GPU image for me π
ok let me try with CPU and then GPU and see if I get the failures
text_embeddings_core::infer: core/src/infer.rs:100: Input validation error: `inputs` must have less than 512 tokens. Given: 980
trying :latest
and not :0.3.0
so there's some bug in 0.3.0
I did not expect that difference beteween 0.3.0 and latest, good catch!
thanks. i saw you were running cpu latest so i figured that was one of the few things left to try