Find answers from the community

Updated 2 months ago

Optimizing llamaindex setup for local use

Hey everyone thanks for having me here.

What hardware (bare metal or VM/VPS) are you running LlamaIndex on (specs wise)?
what does your current infrastructure setup look like (single server, multi server, 1Gb networking between servers, etc)

Any pain points or bottlenecks that you notice?

Thanks in advance. Just trying to get more insight on what all I will need to setup locally to have an optimal experience.
L
G
6 comments
system reqs will vary a lot depending on data size and models that you use

Using OpenAI or another API based provider? Definitely don't need a GPU

Ingesting 1TB of data? You'll need to take special care not to store that all in memory at a given time

Using some hosted vector db? The default is all in-memory, so for large datasets, you'll want a dedicated vector db (either cloud hosted or hosted yourself)
Thanks for the quick answer!
That makes sense.
“Dynamic requirements based on what features of llamaindexer are being utilized.”

I am not using external providers. I would prefer to keep everything as local as possible. Using various models.

I have a wide range of hardware on-premise, so I am trying to dial in what I might need initially based on others experience.

Separate from my various non-GPU servers.

I have 3 GPU nodes: each has 18c/36t or 16c/32t
with either 64GB or 128GB of ram.
(Dual A6000 48GB) (NVLink bridge)
(Dual 3090TI FE 24GB) (NVLink bridge)
(Quad 3090TI FE 12GB)


It is highly likely I will be over the 1TB of ingested data threshold. So let’s just assume that.

As for a hosted vector db. It’s part of why I am curious what infrastructure people are running locally. (But it sounds more like most people are using cloud resources?)


I am assuming that the specific node/machine that llamaindexer is being hosted on will require a lot of ram? Does it also need or benefit from a GPU for any acceleration itself? (Separate from GPU nodes running models that are being called)
only the llm and embedding model will use GPU

in terms of vector dbs, maybe you want to look into qdrant, they provide helm charts and whatnot for self-serve deployments
I might host llms and embeddings on their own servers, maybe using TGI/TEI ? vLLM?
Right on. I appreciate the advice. I’ll take a look at qdrant

I was going to use a few instances of vLLM for the models. Since it seems to be one of the best options performance wise.
Nice that makes sense. I can't remember if vLLM supports embedding models or not though
Add a reply
Sign up and join the conversation on Discord