gist:14d34e2aa85f602f7af89813a13ce010

At a glance

I'm having a problem using the TextEmbeddingInterface remotely:
https://gist.github.com/thoraxe/14d34e2aa85f602f7af89813a13ce010

When I index the same documents using local embedding with the same embedder even, I don't get this error.

100 comments

LLogan M

Ah I see -- local embeddings will just autmatically truncate

LLogan M

I'm surprised that TEI doesnt do the same 🤔

tthoraxe

i don't see anything in these readmes about token input length

tthoraxe

or the embedder

tthoraxe

TEI has a "max batch tokens" setting but the default is a very large number

LLogan M

It is related to the model though, BGE has a max input size of 512 tokens

tthoraxe

https://github.com/huggingface/text-embeddings-inference/blob/v0.3.0/core/src/tokenization.rs#L136

tthoraxe

aha

LLogan M

But I would expect it to just trunacte like normal embddings :PSadge:

LLogan M

ah it has a truncate option!

LLogan M

but how to set that haha

tthoraxe

👀

tthoraxe

looks like truncate is set on the request

tthoraxe

https://github.com/huggingface/text-embeddings-inference/blob/7c0ecbc6987f1e8c8a82b40f471f28e5786bb541/docs/openapi.json#L276

LLogan M

ah, so I probably need to add an option in the request here then
https://github.com/run-llama/llama_index/blob/be3c602a34a81a60393db606fedae6518f42c077/llama_index/embeddings/text_embeddings_inference.py#L63

tthoraxe

if you do I can test it real quick

LLogan M

Yea give me a few to figure out the change and merge it 🙂

LLogan M

hmm trying to test with longer inputs, and I'm getting Error 413 - Payload too large lol

embeddings = embed_model.get_text_embedding("Hello World! " * 512)

LLogan M

apis are fun

tthoraxe

hahaha

tthoraxe

well, LMK if you need anything on my end

LLogan M

I might try spinning up my own TEI server, it kind of sounds like a server config maybe?

tthoraxe

i shared the CLI options earlier. happy to make whatever change you suggest, but the CLI doesn't have many options for launching the server

tthoraxe

https://discord.com/channels/1059199217496772688/1171848380310372383/1171851611983843348 happy to try different flags/values

LLogan M

yea not really sure how to configure this. Tbh I'm wondering how you didn't get the same error

LLogan M

Well, made a PR, but I couldn't confirm if it actually works haha

https://github.com/run-llama/llama_index/pull/8778/files

LLogan M

Like, requests still work for smaller texts as before, but whenever I tried to test the truncating ability I got that error above 🤔

LLogan M

But since it seemed to work for you, I'll consider that a "me" problem

tthoraxe

can you remind me the pip syntax to install from your fork/branch?

tthoraxe

nvm

tthoraxe

bash history ftw

tthoraxe

do i have to set truncate anywhere or are you doing that buried?

tthoraxe

I got the same error at the very end

tthoraxe

Plain Text

Traceback (most recent call last):
  File "/home/thoraxe/.pyenv/versions/3.9.16/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/thoraxe/.pyenv/versions/3.9.16/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/thoraxe/Red_Hat/openshift/llamaindex-experiments/fastapi-lightspeed-service/tools/indexer.py", line 39, in <module>
    product_index = VectorStoreIndex.from_documents(product_documents, storage_context=storage_context, service_context=service_context, show_progress=True)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 102, in from_documents
    return cls(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 49, in __init__
    super().__init__(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 71, in __init__
    index_struct = self.build_index_from_nodes(nodes)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 254, in build_index_from_nodes
    return self._build_index_from_nodes(nodes, **insert_kwargs)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 235, in _build_index_from_nodes
    self._add_nodes_to_index(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 188, in _add_nodes_to_index
    nodes = self._get_node_with_embedding(nodes, show_progress)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 106, in _get_node_with_embedding
    embedding = id_to_embed_map[node.node_id]
KeyError: 'f2e7c36b-fd8c-4562-b366-a6012b3c06bf'

tthoraxe

let me try your example

tthoraxe

you don't actually persist the docs

tthoraxe

I added this code to your example file and got the same error:

Plain Text

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.storage.storage_context import StorageContext

product_documents = SimpleDirectoryReader('data/ocp-product-docs-md').load_data()
storage_context = StorageContext.from_defaults()
product_index = VectorStoreIndex.from_documents(product_documents, storage_context=storage_context, service_context=service_context, show_progress=True)

tthoraxe

I can send you these two markdown files

tthoraxe

they're just public openshift documentation

tthoraxe

still seeing the same problem in the logs of TEI:

Plain Text

Input validation error: `inputs` must have less than 512 tokens. Given: 590

tthoraxe

looks like maybe a pip problem

tthoraxe

soooooo it's worse now

tthoraxe

the progress report shows a lot of:

Plain Text

Generating embeddings:  45%|█████████████████████████████████████████████████████████▏                                                                    | 202/445 [00:01<00:02, 98.52it/s]<Response [413 Payload Too Large]>

tthoraxe

and i get the same error at the end

tthoraxe

ok, FWIW I switched to using only the paul graham essay, and got the same error

tthoraxe

Attachment

LLogan M

ok now you are getting the same <Response [413 Payload Too Large]> as me

tthoraxe

I get this first:

Plain Text

Parsing documents into nodes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 18.34it/s]
Generating embeddings:   0%|                                                                                                                                         | 0/19 [00:00<?, ?it/s]<Response [413 Payload Too Large]>
Generating embeddings:  53%|███████████████████████████████████████████████████████████████████▎                                                            | 10/19 [00:00<00:00, 75.84it/s]<Response [413 Payload Too Large]>
Generating embeddings: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 81.28it/s]

tthoraxe

then I get the error:

Plain Text

Traceback (most recent call last):
  File "/home/thoraxe/Red_Hat/openshift/llamaindex-experiments/fastapi-lightspeed-service/tmptest_embedder.py", line 32, in <module>
    product_index = VectorStoreIndex.from_documents(product_documents, storage_context=storage_context, service_context=service_context, show_progress=True)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 102, in from_documents
    return cls(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 49, in __init__
    super().__init__(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/base.py", line 71, in __init__
    index_struct = self.build_index_from_nodes(nodes)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 254, in build_index_from_nodes
    return self._build_index_from_nodes(nodes, **insert_kwargs)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 235, in _build_index_from_nodes
    self._add_nodes_to_index(
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 188, in _add_nodes_to_index
    nodes = self._get_node_with_embedding(nodes, show_progress)
  File "/home/thoraxe/.pyenv/versions/fastapi-ols-39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 106, in _get_node_with_embedding
    embedding = id_to_embed_map[node.node_id]
KeyError: '8576266a-2f97-48da-b62e-d676ebabf473'

LLogan M

yea the second error is related to the embeddings failing 😅

tthoraxe

so I'm not sure this error is related to the TEI truncation

tthoraxe

well then

tthoraxe

dunno what to tell you 😬

LLogan M

imma open a github issue on TEI

LLogan M

hmmm I see this
https://github.com/huggingface/text-embeddings-inference/issues/50

tthoraxe

you could tweak the code to use my loader and the graham example and throw that in a gist and you have a reproducer

LLogan M

Yea I was already hitting this locally

LLogan M

but I thought it was just a "me" problem

tthoraxe

there's no way i'm sending more than 2MB of data

LLogan M

ikr

tthoraxe

the whole f'n file is not 2mb

LLogan M

yup

tthoraxe

are you going to file a new issue or pile onto that existing one?

LLogan M

New issue

LLogan M

but going try one last test with my own docker

LLogan M

just to be sure

tthoraxe

LLogan M

hey it works for me lol

LLogan M

Plain Text

import httpx

headers = {"Content-Type": "application/json"}

json_data = {"inputs": "Hello World! " * 512, "truncate": True}

with httpx.Client() as client:
    response = client.post(
        "http://127.0.0.1:8080/embed",
        headers=headers,
        json=json_data,
        timeout=60,
    )

data = response.json()

LLogan M

ok going to try with your URL now

LLogan M

hmm that works too... what the 😅

LLogan M

Somehow I can't reproduce using the raw API like that.

But using the actual embeddings class, it works for me locally, but not for your server

LLogan M

:PSadge:

LLogan M

ok got it to reproduce with a similar example to the above. It works on my local server though. So I'm chalking this up to an issue with how it was deployed on your end 🤔

https://gist.github.com/logan-markewich/7a2289ca9efb7ff75ae188c2a2cefb67

LLogan M

That above link works fine for my local docker deployment

LLogan M

but fails when I switch the URL to your server

tthoraxe

how did you launch the TEI server locally?

LLogan M

I just ran the docker image

Except I used cpu-latest and removed mentions of GPU from this sample

https://github.com/huggingface/text-embeddings-inference#docker

tthoraxe

can you share the exact command you ran?

tthoraxe

because a) i'd want to test that locally and b) if you don't have a GPU, maybe something is borked lower-down

LLogan M

let me see if I can find the exact command. Since I launched it a long time ago I can just re-launch from the docker GUI lol

LLogan M

docker run -p 8080:80 -v "C:\Users\logan\Downloads\embeddings_data" --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id "BAAI/bge-large-en-v1.5" --revision "refs/pr/5"

LLogan M

I only have an AMD GPU, so no GPU image for me 🙄

tthoraxe

ok let me try with CPU and then GPU and see if I get the failures

tthoraxe

ok that worked with cpu

tthoraxe

failed with GPU

tthoraxe

so... what's different?

tthoraxe

Plain Text

text_embeddings_core::infer: core/src/infer.rs:100: Input validation error: `inputs` must have less than 512 tokens. Given: 980

tthoraxe

trying :latest and not :0.3.0

tthoraxe

worked!

tthoraxe

so there's some bug in 0.3.0

tthoraxe

😦

tthoraxe

trying with my own docs

WORKED

🥳

amazing!!

I did not expect that difference beteween 0.3.0 and latest, good catch!

tthoraxe

thanks. i saw you were running cpu latest so i figured that was one of the few things left to try

Add a reply

Find answers from the community

gist:14d34e2aa85f602f7af89813a13ce010