Find answers from the community

Updated 3 months ago

gist:14d34e2aa85f602f7af89813a13ce010

I'm having a problem using the TextEmbeddingInterface remotely:
https://gist.github.com/thoraxe/14d34e2aa85f602f7af89813a13ce010

When I index the same documents using local embedding with the same embedder even, I don't get this error.
L
t
100 comments
Ah I see -- local embeddings will just autmatically truncate
I'm surprised that TEI doesnt do the same πŸ€”
i don't see anything in these readmes about token input length
or the embedder
TEI has a "max batch tokens" setting but the default is a very large number
It is related to the model though, BGE has a max input size of 512 tokens
But I would expect it to just trunacte like normal embddings :PSadge:
ah it has a truncate option!
but how to set that haha
looks like truncate is set on the request
if you do I can test it real quick
Yea give me a few to figure out the change and merge it πŸ™‚
hmm trying to test with longer inputs, and I'm getting Error 413 - Payload too large lol

embeddings = embed_model.get_text_embedding("Hello World! " * 512)
apis are fun
well, LMK if you need anything on my end
I might try spinning up my own TEI server, it kind of sounds like a server config maybe?
i shared the CLI options earlier. happy to make whatever change you suggest, but the CLI doesn't have many options for launching the server
yea not really sure how to configure this. Tbh I'm wondering how you didn't get the same error
Well, made a PR, but I couldn't confirm if it actually works haha

https://github.com/run-llama/llama_index/pull/8778/files
Like, requests still work for smaller texts as before, but whenever I tried to test the truncating ability I got that error above πŸ€”
But since it seemed to work for you, I'll consider that a "me" problem
can you remind me the pip syntax to install from your fork/branch?
bash history ftw
do i have to set truncate anywhere or are you doing that buried?
I got the same error at the very end
Plain Text
Traceback (most recent call last):
  File "/home/thoraxe/.pyenv/versions/3.9.16/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/thoraxe/.pyenv/versions/3.9.16/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/thoraxe/Red_Hat/openshift/llamaindex-experiments/fastapi-lightspeed-service/tools/indexer.py", line 39, in <module>
    product_index = VectorStoreIndex.from_documents(product_documents, storage_context=storage_context, service_context=service_context, show_progress=True)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 102, in from_documents
    return cls(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 49, in __init__
    super().__init__(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 71, in __init__
    index_struct = self.build_index_from_nodes(nodes)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 254, in build_index_from_nodes
    return self._build_index_from_nodes(nodes, **insert_kwargs)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 235, in _build_index_from_nodes
    self._add_nodes_to_index(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 188, in _add_nodes_to_index
    nodes = self._get_node_with_embedding(nodes, show_progress)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 106, in _get_node_with_embedding
    embedding = id_to_embed_map[node.node_id]
KeyError: 'f2e7c36b-fd8c-4562-b366-a6012b3c06bf'
let me try your example
you don't actually persist the docs
I added this code to your example file and got the same error:

Plain Text
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.storage.storage_context import StorageContext

product_documents = SimpleDirectoryReader('data/ocp-product-docs-md').load_data()
storage_context = StorageContext.from_defaults()
product_index = VectorStoreIndex.from_documents(product_documents, storage_context=storage_context, service_context=service_context, show_progress=True)
I can send you these two markdown files
they're just public openshift documentation
still seeing the same problem in the logs of TEI:

Plain Text
Input validation error: `inputs` must have less than 512 tokens. Given: 590
looks like maybe a pip problem
soooooo it's worse now
the progress report shows a lot of:
Plain Text
Generating embeddings:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                    | 202/445 [00:01<00:02, 98.52it/s]<Response [413 Payload Too Large]>
and i get the same error at the end
ok, FWIW I switched to using only the paul graham essay, and got the same error
ok now you are getting the same <Response [413 Payload Too Large]> as me
I get this first:
Plain Text
Parsing documents into nodes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 18.34it/s]
Generating embeddings:   0%|                                                                                                                                         | 0/19 [00:00<?, ?it/s]<Response [413 Payload Too Large]>
Generating embeddings:  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                            | 10/19 [00:00<00:00, 75.84it/s]<Response [413 Payload Too Large]>
Generating embeddings: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 19/19 [00:00<00:00, 81.28it/s]
then I get the error:
Plain Text
Traceback (most recent call last):
  File "/home/thoraxe/Red_Hat/openshift/llamaindex-experiments/fastapi-lightspeed-service/tmptest_embedder.py", line 32, in <module>
    product_index = VectorStoreIndex.from_documents(product_documents, storage_context=storage_context, service_context=service_context, show_progress=True)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 102, in from_documents
    return cls(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 49, in __init__
    super().__init__(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 71, in __init__
    index_struct = self.build_index_from_nodes(nodes)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 254, in build_index_from_nodes
    return self._build_index_from_nodes(nodes, **insert_kwargs)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 235, in _build_index_from_nodes
    self._add_nodes_to_index(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 188, in _add_nodes_to_index
    nodes = self._get_node_with_embedding(nodes, show_progress)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 106, in _get_node_with_embedding
    embedding = id_to_embed_map[node.node_id]
KeyError: '8576266a-2f97-48da-b62e-d676ebabf473'
yea the second error is related to the embeddings failing πŸ˜…
so I'm not sure this error is related to the TEI truncation
dunno what to tell you 😬
imma open a github issue on TEI
you could tweak the code to use my loader and the graham example and throw that in a gist and you have a reproducer
Yea I was already hitting this locally
but I thought it was just a "me" problem
there's no way i'm sending more than 2MB of data
the whole f'n file is not 2mb
are you going to file a new issue or pile onto that existing one?
but going try one last test with my own docker
just to be sure
hey it works for me lol
Plain Text
import httpx

headers = {"Content-Type": "application/json"}

json_data = {"inputs": "Hello World! " * 512, "truncate": True}

with httpx.Client() as client:
    response = client.post(
        "http://127.0.0.1:8080/embed",
        headers=headers,
        json=json_data,
        timeout=60,
    )

data = response.json()
ok going to try with your URL now
hmm that works too... what the πŸ˜…
Somehow I can't reproduce using the raw API like that.

But using the actual embeddings class, it works for me locally, but not for your server
ok got it to reproduce with a similar example to the above. It works on my local server though. So I'm chalking this up to an issue with how it was deployed on your end πŸ€”

https://gist.github.com/logan-markewich/7a2289ca9efb7ff75ae188c2a2cefb67
That above link works fine for my local docker deployment
but fails when I switch the URL to your server
how did you launch the TEI server locally?
I just ran the docker image

Except I used cpu-latest and removed mentions of GPU from this sample

https://github.com/huggingface/text-embeddings-inference#docker
can you share the exact command you ran?
because a) i'd want to test that locally and b) if you don't have a GPU, maybe something is borked lower-down
let me see if I can find the exact command. Since I launched it a long time ago I can just re-launch from the docker GUI lol
docker run -p 8080:80 -v "C:\Users\logan\Downloads\embeddings_data" --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id "BAAI/bge-large-en-v1.5" --revision "refs/pr/5"
I only have an AMD GPU, so no GPU image for me πŸ™„
ok let me try with CPU and then GPU and see if I get the failures
ok that worked with cpu
failed with GPU
so... what's different?
Plain Text
text_embeddings_core::infer: core/src/infer.rs:100: Input validation error: `inputs` must have less than 512 tokens. Given: 980
trying :latest and not :0.3.0
so there's some bug in 0.3.0
trying with my own docs
I did not expect that difference beteween 0.3.0 and latest, good catch!
thanks. i saw you were running cpu latest so i figured that was one of the few things left to try
Add a reply
Sign up and join the conversation on Discord