Is is possible to make, say, 10 queries

At a glance

The community member is asking if it's possible to make multiple queries to a large language model (LLM) and put them in a single batch to get faster results, in addition to using asynchronous querying. The comments indicate that batching is not currently supported, with one community member stating that "only async right now" is available. Another community member expresses surprise at this, as batching seems like a simple performance improvement, and suggests that batching and async could be combined. The community members discuss the challenges involved with implementing batching, and the maintainers confirm that they are not currently working on it, as most APIs don't support batching and it would require significant work to integrate it with their existing LLM interfaces. They note that since async querying works well, batching is a lower priority at the moment.

eerlebach123

Is is possible to make, say, 10 queries to a LLM and put them in a single batch to get faster results? I know I can do asynchronous querying, but this is different. I would like to do both. Thanks.

4 comments

LLogan M

nope, only async right now

eerlebach123

Thank you Logan. I must say, given the level of generality of LlamaIndex, and the number of features it offers, it is rather surprising to me. Batching seems like a relatively simple performance win. In fact, batching and async could be combined. Thanks again.

eerlebach123

Ok, I understand the challenges invovled with batching. Are you guys working on it? I realize it could take some time.

LLogan M

Not currently working on it. Most APIs either don't support batching, or batch internally based on incoming requests. None of our LLM interfaces or things that work with LLMs expect batching either, which also means a lot of work to even make it useful

Since async works just fine, its lower priority at the moment

Add a reply

Find answers from the community

Is is possible to make, say, 10 queries