Hi! Is there a way to process Ollama

At a glance

The community members are discussing the possibility of processing Ollama calls in batch, similar to the functionality in Langchain. One community member notes that Ollama only processes requests sequentially, and another suggests using async calls for concurrency, though the bottleneck will be the Ollama server.

The discussion then shifts to whether there is batching functionality for OpenAI in llama_index. A community member responds that there is currently no batching functionality, and suggests using async to run requests concurrently instead.

Another community member expresses interest in using OpenAI's new 24-hour turnaround batch feature for cost purposes, and asks if there is a way to implement it themselves, possibly using a custom LLM implementation. The response suggests that a custom implementation would be required to accept multiple inputs, but notes that the cost may not be reduced unless the 24-hour batch feature is used.

The community members conclude that the 24-hour batch feature could be useful for data pipelines or long-running production data ingestion tasks where live responses are not required.

ppikachu8887867

Hi! Is there a way to process Ollama calls in batch? Like batch functionality in langchain

10 comments

LLogan M

Ollama only processes requests sequentiually anyways

LLogan M

You can do async calls for concurrancy, but the bottleneck will be the ollama server

ppikachu8887867

@Logan M Alright. Thanks!

NNehil

But @Logan M is there batching for OpenAI in llama_index?

LLogan M

not curretly. I would just use async

LLogan M

to run requets concurrently

NNehil

I wanted to use OpenAI batch for cost purposes. Is there a way for me to implement it myself? May be using custom llm implementation?

LLogan M

Yea would have to be some custom implementation to accept multiple inputs. But even then, I think you'd only get use out of it by running the LLM object directly, since nothing else in the framework will know to take advantage of that 🤔

I think the cost will be the same though no? Unless by batch you mean that new 24-hour turnaround batch thing

NNehil

yes I mean the new 24 hour turnaround.

NNehil

For data pipelines or long running production data ingestion stuff we dont need live responses in most cases.

Add a reply

Find answers from the community

Hi! Is there a way to process Ollama