Indexing

CCamBoy

Help with creating a vector store. TQDM progress bar sits at 0% until it completes (if it completes). Perhaps I'm going about this incorrectly somehow. Creating a VectorStoreIndex.from_documents is taking a monumental long time.

Plain Text

llm = Ollama(model="llama3", request_timeout=360.0)
Settings.llm = llm

num_workers = 2
transformations = [
    SentenceSplitter(chunk_size=512, chunk_overlap=25),
    SummaryExtractor(llm=llm, num_workers=num_workers),
    QuestionsAnsweredExtractor(llm=llm, num_workers=num_workers)
]

10 comments

WWhiteFang_Jr

Are you running this on CPU based machine
How much data are you indexing ?

CCamBoy

It's 367 documents, on a 4090.

The large attempt, I just found the TQDM output (had to open the notebook in jupyter as pycharm didn't render it) and it was 10K nodes. way more than I had anticipated, someone reduced the word chunk size in the sentence splitter (read: me) oops.

I used super small sample of 10 documents just now and looked closely at the iteration time for each chunk using my hardware monitor and it was 4seconds for the summary extractor along, per node.

So things were working, but I bit off more than my machine could chew with a single gpu.

CCamBoy

One thing though, I know the back-end is largely async calls, but is the TQDM progress bar holding at 0% until it's finished an expected behavior?

LLogan M

I would expect this to take a while on local machine, especially through ollama. Its making 2 LLM calls per node, each call takes about 30-60s maybe, thats a lot of LLM calls.

Plus, ollama can only process things sequentially, it has no batch processing or parallelism

Not sure whats going on with the progress bar though

LLogan M

(Like, llama-index is sending requests to ollama, but ollama can only process requests sequentially)

CCamBoy

Roger, any local alternatives to Ollama that can handle some level of parallelism?

CCamBoy

llama.cpp perhaps?

CCamBoy

I see llama.cpp is the backbone of ollama. ollama has some beta/experimental num_workers parameters that can be set when starting the service. might try that.

LLogan M

oh thats new 👀

LLogan M

The only other ones I know that have some amount of batch/parallel processing are a bit more heavyweight, stuff like vllm, TGI

Add a reply

Find answers from the community

Indexing