LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

Poor Performance

Poor Performance

At a glance

The community members are discussing performance issues with their large language model (LLM) indexes, particularly with response times and memory usage. They have tried various indexing approaches like GPTSimpleVectorIndex, Pinecone, Faiss, and LlamaIndex, and are comparing the pros and cons of each. Some key points:

- Response times can be over a minute with large datasets, even with optimizations like async and prompt engineering.

- RAM usage is a concern when keeping large indexes in memory.

- The community members are experimenting with different indexing approaches and benchmarking their performance, but haven't found a clear winner yet.

- There is a discussion around the tradeoffs of using a single index vs. composable indexes for different document sets.

- Langchain is mentioned as a potentially faster option, but with some drawbacks like less friendly data loaders.

- PDF loading is noted as being particularly slow, and the community members are looking for faster APIs to handle this.

There is no explicitly marked answer in the comments.

·

Same! Over a minute for a response with 1/2 nodes passed. I have GBs of data and I used GPTSImpleVectorIndex so far. Today I initialized a Pinecone index to check performances. I was also wondering if RAM could be an issue since the index is quite heavy to keep it in memory. Glad to hear someone else with same issues. Lets keep each other updated on improvements!

R

A

L

36 comments

awesome!!

I wonder what other indices have you tried

I think we need some benchmarking on the performance.

I already utilized async and optimization, but doesn't really help much.

ListIndex and TreeIndex are too expensive for my use case and the simplevectorindex during testing time (ie with few docs) worked pretty well. I noticed that k top similarities and prompt engineering have the most impact on time response. Though to be fast enough it needs to produce just few tokens and that means short answers that not always are the best.

I used async as well, same results as you

What chunk size are you using? 1200 and 200 of overlap maybe is too much🧐

yeah im using the default

Im trying with faiss and would like to try other vector stores to see if they are better

have you tired langchain?

its super fast

but I need the streaming and it's not friendly

I actually started with langchain but I gave up for llama index lol

With langchain I used to have bad performance in terms of quality, but i was just starting maybe it worths give it a shot

Even though gpt index is very friendly and now i got familiar with it

oh for real?

haha. I started with llama but got to try langchain yesterday

The data loaders are much more

which is a huge plus for my use case

Oh really? I feel like llama index has loaders for nearly everything lol

Hey Logan.

I have a question about design choice.

Say I wanna query about 3 docs. First I need to construct an index, and then query. For construction, is it better to use a directory loader to load all three and output one index or individually compose each, and then use list index to compose?

It really just depends. If the three documents are very clearly about different topics, then making 3 indexes and wrapping them with a list/keyword/vector index makes sense

But if all the documents cover similar information, one index would be better

That's just how I would approach it anyways

Yeah. Performance overhead is my primary concern, and from my experiments, composable indices runs much slower

If you use a list index at the top level, it will check every sub index, so yea pretty slow

For Speed, I would use a keyword or vector index at the top. Just need to generate summary for each sub index for that to work (either using the LLM or maybe you have access to one ahead of time)

From your experience langchain is faster also with large amount of data stored?

it depends. but I think faiss is fast

I switched from simple vec store to faiss right now

Glad to hear that! Out of curiosity, how much seconds does a response take in average?

Im still bench marking. I'll update later

Btw, i found loading pdfs are extremely slow

any api that can handle this extremely fast?

(You might want to benchmark the embeddings and LLM separately. If you use `response_mode="no_text" in your query, it will only fetch the closest nodes and skip calling the LLM to generate text)

Pdf loading is done using PyPDF. There are other packages that might have better performance though, I remember seeing a github repo somewhere that benchmarked all the python pdf libraries

Thanks Logan! I'll check it out.

@Ray Li hey did you find something in terms of performance? How much time does your index need to response?

Add a reply

Sign up and join the conversation on Discord