Find answers from the community

Updated 2 months ago

Hi! Is there a way to process Ollama

Hi! Is there a way to process Ollama calls in batch? Like batch functionality in langchain
L
p
N
10 comments
Ollama only processes requests sequentiually anyways
You can do async calls for concurrancy, but the bottleneck will be the ollama server
@Logan M Alright. Thanks!
But @Logan M is there batching for OpenAI in llama_index?
not curretly. I would just use async
to run requets concurrently
I wanted to use OpenAI batch for cost purposes. Is there a way for me to implement it myself? May be using custom llm implementation?
Yea would have to be some custom implementation to accept multiple inputs. But even then, I think you'd only get use out of it by running the LLM object directly, since nothing else in the framework will know to take advantage of that πŸ€”

I think the cost will be the same though no? Unless by batch you mean that new 24-hour turnaround batch thing
yes I mean the new 24 hour turnaround.
For data pipelines or long running production data ingestion stuff we dont need live responses in most cases.
Add a reply
Sign up and join the conversation on Discord