Docs

At a glance

its very frustrating that it is so hard to send parallel embeddings to openai with llama index. With langchain, my embedding job is done in under 2 mins but llama index takes up to an hour. There is no simple way to get parralelized embeddings but langchain has this natively. Seriously annoying. Also this new update has made all of the documentation inaccurate and kind of useless.

18 comments

LLogan M

Let me know which docs are broken 🙂

You can increase the batch size for emebddings.

Or, you can parallelize the entire ingestion process

https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/root.html#parallel-processing

bben25635

thanks for the quickj reply!

I'm trying to rework the sec insights repo but the vectorisation takes an enormous amount of time.

I've got it working nicely with langchain and a faiss index in a jupyter notebook. Is there a good way to just pass this FAISS index into a query engine? I'mn struggling to find relevant recent information in the documentation

bben25635

and does the parallelisation approach shown above work via batch sending to open ai? My local compute is not suitable

LLogan M

I don't think the FAISS index you created will be compatible. FAISS only stores an ID to embedding vector map, so the actual text needs to be stored somewhere. (in llama-index, this would be in a docstore)

LLogan M

Indeed it does. Let me make an example with FAISS

LLogan M

Well, besides faiss itself missing from the dependencies of llama-index-vector-stores-faiss (maybe becasue there is a CPU and GPU veresion?), I did this:

Plain Text

import os
os.environ["OPENAI_API_KEY"] = "sk-..."

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.text_splitter import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.vector_stores.faiss import FaissVectorStore

import faiss

vector_store = FaissVectorStore(
    faiss_index=faiss.IndexFlatL2(1536)
)
documents = SimpleDirectoryReader("./docs/examples/data/10k").load_data()

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=256, chunk_overlap=20),
        OpenAIEmbedding(embed_batch_size=256),
    ],
)
# run the pipeline
import time
start = time.time()
nodes = pipeline.run(documents=documents)
end = time.time()
print(f"Time taken: {end - start} seconds for {len(nodes)} nodes")
index = VectorStoreIndex(
    nodes=nodes, 
    storage_context=StorageContext.from_defaults(vector_store=vector_store)
)

> Time taken: 17.46170997619629 seconds for 2721 nodes

LLogan M

tbh setting num_workers in the run() was actually causing issues... flagged this to the collegue who worked on it :PSadge:

LLogan M

In any case, increasing the batch size has a pretty large effect on runtime already

LLogan M

the max batch size is 2048

bben25635

ok thanks so much this is very helpful!

bben25635

the batch size is clearly what i was notable to find

bben25635

not able

LLogan M

It should help quite a bit!

Yea I feel ya, easy to miss. We are working on a pretty large docs overhaul, with a large focus on proper API docs 🙂

LLogan M

Hopefully ready in 2-3 weeks 🙏

bben25635

hey! this is still taking 20+ minutes for me. It's a 4k page document but i feel like it shouldn't be anywhere near that long? The num_workers seems to break it so I've been just setting the batch size to 2048. Could the fact I'm running it in a colab notebook have an effect?

LLogan M

oh! in colab hey

I've definitely noticed OpenAI seems to rate limit requests from google colab notebooks much more than from your own computer

bben25635

oookay good to know

bben25635

i'll try locally

Add a reply

Find answers from the community

Docs