Find answers from the community

Updated 8 months ago

How would I upsert nodes in parallel

At a glance

How would I upsert nodes in parallel using VectorStoreIndex using pinecone, since i have so many nodes to upsert. I would of thought use_async would of upserted each batch in parallel but just upserts sequentially.
index = VectorStoreIndex(nodes, storage_context=storage_context,use_async=True,insert_batch_size=1500)

2 comments

LLogan M

use async is just for generating embeddings

If you want to insert in parallel, you could use some lower-level APIs. I might use the ingestion pipeline here

Plain Text

pipeline = IngestionPipeline(transformations=[SentenceSplitter(), OpenAIEmbedding()])

nodes = await pipeline.arun(documents=documents)

batches = <split init batches>
jobs = [vector_store.async_add(node_batch) for node_batch in batches]
await asyncio.gather(jobs)

index = VectorStoreIndex.from_vector_store(vector_store)

ggoldenfuze

This makes sense i will try this

Add a reply