Find answers from the community

Updated 9 months ago

Is is possible to make, say, 10 queries

At a glance

The community member is asking if it's possible to make multiple queries to a large language model (LLM) and put them in a single batch to get faster results, in addition to using asynchronous querying. The comments indicate that batching is not currently supported, with one community member stating that "only async right now" is available. Another community member expresses surprise at this, as batching seems like a simple performance improvement, and suggests that batching and async could be combined. The community members discuss the challenges involved with implementing batching, and the maintainers confirm that they are not currently working on it, as most APIs don't support batching and it would require significant work to integrate it with their existing LLM interfaces. They note that since async querying works well, batching is a lower priority at the moment.

Is is possible to make, say, 10 queries to a LLM and put them in a single batch to get faster results? I know I can do asynchronous querying, but this is different. I would like to do both. Thanks.
L
e
4 comments
nope, only async right now
Thank you Logan. I must say, given the level of generality of LlamaIndex, and the number of features it offers, it is rather surprising to me. Batching seems like a relatively simple performance win. In fact, batching and async could be combined. Thanks again.
Ok, I understand the challenges invovled with batching. Are you guys working on it? I realize it could take some time.
Not currently working on it. Most APIs either don't support batching, or batch internally based on incoming requests. None of our LLM interfaces or things that work with LLMs expect batching either, which also means a lot of work to even make it useful

Since async works just fine, its lower priority at the moment
Add a reply
Sign up and join the conversation on Discord