Find answers from the community

s
F
Y
a
P
Updated last month

When I reduce `Settings.chunk_size` to

When I reduce Settings.chunk_size to 128, it slows down my TEI embeddings waaay too much, from 2s to 40s.
d
L
14 comments
I increased embed_batch_size to 128, but that didn't help. Also, it seems like TEI is only using CPU and not my GPUs, even though I launched its docker image with "docker run --gpus all".
Hmm, I think it's something inside VectorStoreIndex that's getting bogged down and not TEI itself.
That would generate quite a few chunks. embed_batch_size is one param.

insert_batch_size could also be raised on the index itself
Not sure about the gpu issue. I know they have specific docker images for gpu support
I already have insert_batch_size set to 4096.
Plain Text
index = VectorStoreIndex.from_documents(documents, show_progress=True, insert_batch_size=4096)
I'm using the correct TEI Docker image for my Ampere GPUs.
Attachment
image.png
Tbh I'm going to guess it's just a ton of data to embed
128 chunk size is very small
Yeah, I think you're right. When I reduce the number of overall documents, it speeds up quite a bit.
When I launch TEI with docker run --gpus "device=0", it does use the first GPU. Setting it to "all" per their directions doesn't seem to work, though.
@Logan M Perhaps there's a way to keep chunk_size high but whittle down the text in some nodes so that my resulting prompt doesn't exceed the LLM's context length (and doesn't take forever to write the response)?
That is, for certain nodes, I'd like to do away with irrelevant text.
It removes sentences from chunks based on similarity before synthesizing
Add a reply
Sign up and join the conversation on Discord