Find answers from the community

Updated last year

Hello - I have a more architectural

Hello - I have a more architectural question, hopefully you can help me like with every other question I had before (thank you @Logan M πŸ˜„ ). Using the sentencewindow postprocessor, we end up with a ton of embeddings to calculate. We ingest new updated documentation almost every week and it started to get difficult to calculate the amount of embeddings. Running it on CPU, it takes a couple of hours, on GPU its less than 30 minutes. From a cost standpoint, I am curious if there is any way of having on demand GPUs / machines that can do the embedding calculations for us (e.g serverless GPUs) or any solution that you would recommend to use? We are currently deployed in GCP so that would make more sense.
Thank you!
J
L
31 comments
I have tried this https://github.com/huggingface/text-embeddings-inference which on GPU on my local PC seems to work fine, I could not make it work to run CPU only to assess its performance.
Btw, regarding this TEI, any idea what the command is to do truncation for text that is bigger than embedding input size so it doesent error?
Yea it will truncate in order to avoid errors. Most embeddings models are only usable with up to 512 tokens

Does GCP have ondemand scalable servers? You could deploy TEI on a GPU instance with scaling now?

I know huggingface may have some on-demand deployment options too
I see in the logs for TEI that truncate is set to false so it does error out. I'd like to set it to true but could not find anything on the internet on how to set it to true. I'm running it like this:
Plain Text
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.3.0 --model-id $model --revision $revision
most likely your suggestion is what we're going to do as well, run the TEI on a GPU machine in GCP
idk if we can do a cloud run out of it, probably not but that seems to be the case
In latest versions of llama-index, TEI with llama-index is set to truncate=True by default (its in the request actually)
So i'm running the latest one and I get this error:

Plain Text
2023-11-13T19:55:49.415598Z ERROR embed:embed{inputs="title: blblablabla_ API Parameters Key Type Example Description token string \"token\":\" <your_token> \" Required." truncate=false permit=OwnedSemaphorePermit { sem: Semaphore { ll_sem: Semaphore { permits: 486 } }, permits: 1 }}: text_embeddings_core::infer: core/src/infer.rs:100: Input validation error: `inputs` must have less than 512 tokens. Given: 602


The way I'm running is this:

from llama_index.embeddings import TextEmbeddingsInference embed_model = TextEmbeddingsInference( model_name="BAAI/bge-large-en-v1.5", base_url = "http://127.0.0.1:8080", timeout=60, # timeout in seconds embed_batch_size=30, ) node_parser = SentenceWindowNodeParser.from_defaults( window_size=10, window_metadata_key="window", original_text_metadata_key="original_text", ) simple_node_parser = SimpleNodeParser.from_defaults() llm = OpenAI(model="gpt-3.5-turbo-16k", temperature=0.1) ctx = ServiceContext.from_defaults( llm=llm, embed_model=embed_model, ) sentence_index = VectorStoreIndex(nodes, service_context=ctx, show_progress=True)

Am I missing something?
hmmm you sure it's latest? pip show llama-index

I confirmed this worked with my own TEI deployment πŸ€”
Name: llama-index
Version: 0.8.68
Summary: Interface between LLMs and your data
Home-page: https://llamaindex.ai
Author: Jerry Liu
Author-email: jerry@llamaindex.ai
License: MIT
i literally just hit pip upgrade
before sending you the output
mmmmmm

I see in the error it has truncate=False πŸ˜…

What happens if you run print(embed_model.truncate_text) ?
If you are running in a notebook, you might need to restart the runtime
i did restart the runtime after upgrade
Attachment
image.png
hmmm so you have the attribute, that's good
I ran into this last week
0.3.0 has a bug
you need to use the latest tag
took forever to debug that last week
just remembered lol
succesful πŸ™‚
thanks a ton, yet again!
awesome, glad you got it working!
well, you got it, i just used your knowledge haha
Add a reply
Sign up and join the conversation on Discord