Find answers from the community

Updated 12 months ago

Docs

At a glance
its very frustrating that it is so hard to send parallel embeddings to openai with llama index. With langchain, my embedding job is done in under 2 mins but llama index takes up to an hour. There is no simple way to get parralelized embeddings but langchain has this natively. Seriously annoying. Also this new update has made all of the documentation inaccurate and kind of useless.
L
b
18 comments
Let me know which docs are broken πŸ™‚

You can increase the batch size for emebddings.

Or, you can parallelize the entire ingestion process

https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/root.html#parallel-processing
thanks for the quickj reply!

I'm trying to rework the sec insights repo but the vectorisation takes an enormous amount of time.

I've got it working nicely with langchain and a faiss index in a jupyter notebook. Is there a good way to just pass this FAISS index into a query engine? I'mn struggling to find relevant recent information in the documentation
and does the parallelisation approach shown above work via batch sending to open ai? My local compute is not suitable
I don't think the FAISS index you created will be compatible. FAISS only stores an ID to embedding vector map, so the actual text needs to be stored somewhere. (in llama-index, this would be in a docstore)
Indeed it does. Let me make an example with FAISS
Well, besides faiss itself missing from the dependencies of llama-index-vector-stores-faiss (maybe becasue there is a CPU and GPU veresion?), I did this:

Plain Text
import os
os.environ["OPENAI_API_KEY"] = "sk-..."

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.text_splitter import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.vector_stores.faiss import FaissVectorStore

import faiss

vector_store = FaissVectorStore(
    faiss_index=faiss.IndexFlatL2(1536)
)
documents = SimpleDirectoryReader("./docs/examples/data/10k").load_data()

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=256, chunk_overlap=20),
        OpenAIEmbedding(embed_batch_size=256),
    ],
)
# run the pipeline
import time
start = time.time()
nodes = pipeline.run(documents=documents)
end = time.time()
print(f"Time taken: {end - start} seconds for {len(nodes)} nodes")
index = VectorStoreIndex(
    nodes=nodes, 
    storage_context=StorageContext.from_defaults(vector_store=vector_store)
)

> Time taken: 17.46170997619629 seconds for 2721 nodes
tbh setting num_workers in the run() was actually causing issues... flagged this to the collegue who worked on it :PSadge:
In any case, increasing the batch size has a pretty large effect on runtime already
the max batch size is 2048
ok thanks so much this is very helpful!
the batch size is clearly what i was notable to find
It should help quite a bit!

Yea I feel ya, easy to miss. We are working on a pretty large docs overhaul, with a large focus on proper API docs πŸ™‚
Hopefully ready in 2-3 weeks πŸ™
hey! this is still taking 20+ minutes for me. It's a 4k page document but i feel like it shouldn't be anywhere near that long? The num_workers seems to break it so I've been just setting the batch size to 2048. Could the fact I'm running it in a colab notebook have an effect?
oh! in colab hey

I've definitely noticed OpenAI seems to rate limit requests from google colab notebooks much more than from your own computer
oookay good to know
i'll try locally
Add a reply
Sign up and join the conversation on Discord