Async

At a glance

The community member is asking if they can parallelize the from_documents function, as the embedding process is taking a long time. They have provided their current code, which uses ChromaDB and LlamaIndex to create a vector store index.

In the comments, another community member suggests making the index creation asynchronous to speed up the ingestion process, and provides a link to an example in the LlamaIndex documentation.

There is no explicitly marked answer in the comments.

Useful resources

eedk

Hello. Does anyone know if I can parallelize the from_documents function. The embedding process is taking a long time and I was wondering if it can be accomplished in parallel. My code currently is

Plain Text

import chromadb
from llama_index.vector_stores import ChromaVectorStore

db = chromadb.PersistentClient(path="./polygon")
collection = db.get_or_create_collection("default")
vector_store = ChromaVectorStore(chroma_collection=collection)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
index = index.from_documents(documents, show_progress=True)
query_engine = index.as_query_engine()

I want to speed up the ingestion of documents

2 comments

WWhiteFang_Jr

I think, You can make it async to perform ingestion:
https://docs.llamaindex.ai/en/stable/examples/vector_stores/AsyncIndexCreationDemo.html#simple-vector-store-async-index-creation

eedk

Many thanks

Add a reply

Find answers from the community

Async