Is is possible to make, say, 10 queries to a LLM and put them in a single batch to get faster results? I know I can do asynchronous querying, but this is different. I would like to do both. Thanks.
Thank you Logan. I must say, given the level of generality of LlamaIndex, and the number of features it offers, it is rather surprising to me. Batching seems like a relatively simple performance win. In fact, batching and async could be combined. Thanks again.
Not currently working on it. Most APIs either don't support batching, or batch internally based on incoming requests. None of our LLM interfaces or things that work with LLMs expect batching either, which also means a lot of work to even make it useful
Since async works just fine, its lower priority at the moment