LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

Hello - I have a more architectural

Hello - I have a more architectural

At a glance

·

Hello - I have a more architectural question, hopefully you can help me like with every other question I had before (thank you @Logan M 😄 ). Using the sentencewindow postprocessor, we end up with a ton of embeddings to calculate. We ingest new updated documentation almost every week and it started to get difficult to calculate the amount of embeddings. Running it on CPU, it takes a couple of hours, on GPU its less than 30 minutes. From a cost standpoint, I am curious if there is any way of having on demand GPUs / machines that can do the embedding calculations for us (e.g serverless GPUs) or any solution that you would recommend to use? We are currently deployed in GCP so that would make more sense.
Thank you!

J

L

31 comments

I have tried this https://github.com/huggingface/text-embeddings-inference which on GPU on my local PC seems to work fine, I could not make it work to run CPU only to assess its performance.
Btw, regarding this TEI, any idea what the command is to do truncation for text that is bigger than embedding input size so it doesent error?

Yea it will truncate in order to avoid errors. Most embeddings models are only usable with up to 512 tokens

Does GCP have ondemand scalable servers? You could deploy TEI on a GPU instance with scaling now?

I know huggingface may have some on-demand deployment options too

I see in the logs for TEI that truncate is set to false so it does error out. I'd like to set it to true but could not find anything on the internet on how to set it to true. I'm running it like this:

Plain Text

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.3.0 --model-id $model --revision $revision

most likely your suggestion is what we're going to do as well, run the TEI on a GPU machine in GCP

idk if we can do a cloud run out of it, probably not but that seems to be the case

In latest versions of llama-index, TEI with llama-index is set to truncate=True by default (its in the request actually)

Attachment

Attachment

So i'm running the latest one and I get this error:

Plain Text

2023-11-13T19:55:49.415598Z ERROR embed:embed{inputs="title: blblablabla_ API Parameters Key Type Example Description token string \"token\":\" <your_token> \" Required." truncate=false permit=OwnedSemaphorePermit { sem: Semaphore { ll_sem: Semaphore { permits: 486 } }, permits: 1 }}: text_embeddings_core::infer: core/src/infer.rs:100: Input validation error: `inputs` must have less than 512 tokens. Given: 602

The way I'm running is this:

from llama_index.embeddings import TextEmbeddingsInference

embed_model = TextEmbeddingsInference(
    model_name="BAAI/bge-large-en-v1.5",
    base_url = "http://127.0.0.1:8080",
    timeout=60,  # timeout in seconds
    embed_batch_size=30,
)
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=10,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)
simple_node_parser = SimpleNodeParser.from_defaults()

llm = OpenAI(model="gpt-3.5-turbo-16k", temperature=0.1)
ctx = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
)
sentence_index = VectorStoreIndex(nodes, service_context=ctx, show_progress=True)

Am I missing something?

hmmm you sure it's latest? pip show llama-index

I confirmed this worked with my own TEI deployment 🤔

Name: llama-index
Version: 0.8.68
Summary: Interface between LLMs and your data
Home-page: https://llamaindex.ai
Author: Jerry Liu
Author-email: jerry@llamaindex.ai
License: MIT

i literally just hit pip upgrade

before sending you the output

mmmmmm

I see in the error it has truncate=False 😅

What happens if you run print(embed_model.truncate_text) ?

If you are running in a notebook, you might need to restart the runtime

i did restart the runtime after upgrade

Attachment

weird...

hmmm so you have the attribute, that's good

Oh!

I ran into this last week

0.3.0 has a bug

you need to use the latest tag

omg ...

ikr

took forever to debug that last week

just remembered lol

succesful 🙂

thanks a ton, yet again!

awesome, glad you got it working!

well, you got it, i just used your knowledge haha

Add a reply

Sign up and join the conversation on Discord