Find answers from the community

Updated 4 months ago

When I reduce `Settings.chunk_size` to

At a glance

When I reduce Settings.chunk_size to 128, it slows down my TEI embeddings waaay too much, from 2s to 40s.

14 comments

I increased embed_batch_size to 128, but that didn't help. Also, it seems like TEI is only using CPU and not my GPUs, even though I launched its docker image with "docker run --gpus all".

ddoughboy

Hmm, I think it's something inside VectorStoreIndex that's getting bogged down and not TEI itself.

LLogan M

That would generate quite a few chunks. embed_batch_size is one param.

insert_batch_size could also be raised on the index itself

LLogan M

Not sure about the gpu issue. I know they have specific docker images for gpu support

ddoughboy

I already have insert_batch_size set to 4096.

Plain Text

index = VectorStoreIndex.from_documents(documents, show_progress=True, insert_batch_size=4096)

ddoughboy

I'm using the correct TEI Docker image for my Ampere GPUs.

Attachment

LLogan M

Tbh I'm going to guess it's just a ton of data to embed

LLogan M

128 chunk size is very small

ddoughboy

Yeah, I think you're right. When I reduce the number of overall documents, it speeds up quite a bit.

ddoughboy

When I launch TEI with docker run --gpus "device=0", it does use the first GPU. Setting it to "all" per their directions doesn't seem to work, though.

ddoughboy

@Logan M Perhaps there's a way to keep chunk_size high but whittle down the text in some nodes so that my resulting prompt doesn't exceed the LLM's context length (and doesn't take forever to write the response)?

ddoughboy

That is, for certain nodes, I'd like to do away with irrelevant text.

LLogan M

It's a tad experimental, but you could try this optimizer thing

https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/OptimizerDemo.html#sentence-embedding-optimizer

LLogan M

It removes sentences from chunks based on similarity before synthesizing

Add a reply