The community members are discussing the possibility of processing Ollama calls in batch, similar to the functionality in Langchain. One community member notes that Ollama only processes requests sequentially, and another suggests using async calls for concurrency, though the bottleneck will be the Ollama server.
The discussion then shifts to whether there is batching functionality for OpenAI in llama_index. A community member responds that there is currently no batching functionality, and suggests using async to run requests concurrently instead.
Another community member expresses interest in using OpenAI's new 24-hour turnaround batch feature for cost purposes, and asks if there is a way to implement it themselves, possibly using a custom LLM implementation. The response suggests that a custom implementation would be required to accept multiple inputs, but notes that the cost may not be reduced unless the 24-hour batch feature is used.
The community members conclude that the 24-hour batch feature could be useful for data pipelines or long-running production data ingestion tasks where live responses are not required.
Yea would have to be some custom implementation to accept multiple inputs. But even then, I think you'd only get use out of it by running the LLM object directly, since nothing else in the framework will know to take advantage of that π€
I think the cost will be the same though no? Unless by batch you mean that new 24-hour turnaround batch thing