Find answers from the community

Updated 5 months ago

Hi! Is there a way to process Ollama

At a glance

The community members are discussing the possibility of processing Ollama calls in batch, similar to the functionality in Langchain. One community member notes that Ollama only processes requests sequentially, and another suggests using async calls for concurrency, though the bottleneck will be the Ollama server.

The discussion then shifts to whether there is batching functionality for OpenAI in llama_index. A community member responds that there is currently no batching functionality, and suggests using async to run requests concurrently instead.

Another community member expresses interest in using OpenAI's new 24-hour turnaround batch feature for cost purposes, and asks if there is a way to implement it themselves, possibly using a custom LLM implementation. The response suggests that a custom implementation would be required to accept multiple inputs, but notes that the cost may not be reduced unless the 24-hour batch feature is used.

The community members conclude that the 24-hour batch feature could be useful for data pipelines or long-running production data ingestion tasks where live responses are not required.

Hi! Is there a way to process Ollama calls in batch? Like batch functionality in langchain
L
p
N
10 comments
Ollama only processes requests sequentiually anyways
You can do async calls for concurrancy, but the bottleneck will be the ollama server
@Logan M Alright. Thanks!
But @Logan M is there batching for OpenAI in llama_index?
not curretly. I would just use async
to run requets concurrently
I wanted to use OpenAI batch for cost purposes. Is there a way for me to implement it myself? May be using custom llm implementation?
Yea would have to be some custom implementation to accept multiple inputs. But even then, I think you'd only get use out of it by running the LLM object directly, since nothing else in the framework will know to take advantage of that πŸ€”

I think the cost will be the same though no? Unless by batch you mean that new 24-hour turnaround batch thing
yes I mean the new 24 hour turnaround.
For data pipelines or long running production data ingestion stuff we dont need live responses in most cases.
Add a reply
Sign up and join the conversation on Discord