load index

At a glance

Hi, is there any way to make the index in RAM faster? I'm using this call:

index_finance = load_index_from_storage(storage_context)

but for a file of around 5GB it takes too long. I think it only uses one core and it changes the core for each node loaded

17 comments

rravitheja

Hey @davidp ,

For 5GB data would recommend using VectorDB instead.

ddavidp

hmm, but once loaded in RAM just using the index loaded in RAM would be faster than the vectorDB, no?

rravitheja

I don't think so. @Logan M can confirm on this.

ddavidp

5GB of vector store in a file has taken in my server 20' and it then takes 20GB of RAM. It's not dramatic, I could load much more documents in a 64RAM machine. But then if I'm serving the RAG on the Internet I can only serve 1 client xd. What's the way to scale these systems?

LLogan M

A vectorDB would most certainly be faster for this amount of data

The default vector store in llamaindex does very simple retrieval -- it does a pairwise comparison of the query to each embedding using consine similarity. Plus, loading 5GB of data into memory is slow since it's not optimized or compressed

Even in-memory vector dbs like qdrant or chroma will be dramatically faster and more optimized at this scale

ddavidp

got it @Logan M and when going for a vectorDB, between going for an in-memory DB like chroma or qdrant and going for a disk one like astradb or weaviate , what would you prefer for speed?

ddavidp

by the way, weaviate is partially on RAM?

LLogan M

Any other vector db will also use ram, and most have some option to save to disk. I'm just saying they are extremely optimized for the task and should be much faster 👍

ddavidp

that sounds good. Because I'm using a local llama with Ollama with an M1 pro chip I have total times of 25-35 seconds for a petition. So, I was just neglecting the query vectorization and retrieval time. But once I put that Ollama in a GPU there will be much more sense into tuning the retrieval part by using some db like pinecone

ddavidp

@Logan M in terms of using indexing when not using Chroma for example is there any difference between using a vectorindex or loading from disk?

A:

documentsAll = SimpleDirectoryReader("/mnt/nasmixprojects/books/",  recursive=True,).load_data()
index_all = VectorStoreIndex.from_documents(documentsAll, service_context=service_context)

storage_context = StorageContext.from_defaults(persist_dir="sentences_2023_12_02_index_all")
index_finance = load_index_from_storage(storage_context)

ddavidp

@Logan M @ravitheja I've tested with and wihout the ChromaDB and answering a query has taken the same:

chroma (first image):
no chroma (second image):
in both cases it took 30' to build the index

Attachments

ddavidp

query: response = query_engine3.query("what is antifragility?")
with chroma: 14,7s
without chroma= 14s
if I repeat the question in both cases it can then take 8s

it was 31MB of pdfs and epub books.
two separate notebooks in the same server. Executed one after the other.

LLogan M

Building the index would be the same speed. I think
a) but loading will be faster with chroma (or any vector db really). Although chroma doesn't really advertise "speed" as a feature, as much as other vector dbs
b) it will scale to larger vector dbs better without using so much RAM

I think testing with query() is not quite the right approach. The majority of time spent will be in LLM calls, but quite a large margin. I would be testing retrieve instead

Plain Text

retriever = index.as_retriever(similarity_top_k=2)
nodes = retriever.retrieve("query")

ddavidp

got it. So, the improvement has to be in the retrival part only. However, I'm loading the same documents now with Chroma and without vectorDB and the RAM use is very different. I'll wait until all the load is completed and tell:

Attachment

ddavidp

By the way, is there any way to store in disk the ChromaDB so it loads faster next time? seems as this ephemeral client is only for RAM use:
chroma_client = chromadb.EphemeralClient()

LLogan M

You can use the persistClient db = chromadb.PersistentClient(path="./chroma_db")

LLogan M

Or you can host a sever and use an HttpClient -- many options 🙂

Add a reply

Find answers from the community

load index