Find answers from the community

Updated 3 months ago

Indexing

Help with creating a vector store. TQDM progress bar sits at 0% until it completes (if it completes). Perhaps I'm going about this incorrectly somehow. Creating a VectorStoreIndex.from_documents is taking a monumental long time.
Plain Text
llm = Ollama(model="llama3", request_timeout=360.0)
Settings.llm = llm

num_workers = 2
transformations = [
    SentenceSplitter(chunk_size=512, chunk_overlap=25),
    SummaryExtractor(llm=llm, num_workers=num_workers),
    QuestionsAnsweredExtractor(llm=llm, num_workers=num_workers)
]
W
C
L
10 comments
  1. Are you running this on CPU based machine
  2. How much data are you indexing ?
It's 367 documents, on a 4090.

The large attempt, I just found the TQDM output (had to open the notebook in jupyter as pycharm didn't render it) and it was 10K nodes. way more than I had anticipated, someone reduced the word chunk size in the sentence splitter (read: me) oops.

I used super small sample of 10 documents just now and looked closely at the iteration time for each chunk using my hardware monitor and it was 4seconds for the summary extractor along, per node.

So things were working, but I bit off more than my machine could chew with a single gpu.
One thing though, I know the back-end is largely async calls, but is the TQDM progress bar holding at 0% until it's finished an expected behavior?
I would expect this to take a while on local machine, especially through ollama. Its making 2 LLM calls per node, each call takes about 30-60s maybe, thats a lot of LLM calls.

Plus, ollama can only process things sequentially, it has no batch processing or parallelism

Not sure whats going on with the progress bar though
(Like, llama-index is sending requests to ollama, but ollama can only process requests sequentially)
Roger, any local alternatives to Ollama that can handle some level of parallelism?
llama.cpp perhaps?
I see llama.cpp is the backbone of ollama. ollama has some beta/experimental num_workers parameters that can be set when starting the service. might try that.
oh thats new πŸ‘€
The only other ones I know that have some amount of batch/parallel processing are a bit more heavyweight, stuff like vllm, TGI
Add a reply
Sign up and join the conversation on Discord