Find answers from the community

Updated last year

Hey I have one question so can I do

At a glance

The community member is asking if they can do batched inference with LLMs, specifically for APIs like OpenAI API and direct access to open-source LLMs like Llama. Another community member responds that currently, the inference is done in a for loop, but they are planning to add better batch support. The original poster acknowledges the response.

Hey, I have one question, so can I do batched inference with llama-index LLMs. Lets take two examples

  1. For APIs like Open AI API
  2. Direct access of OSS LLMs like Llama
In both of the cases, am I able to do generation in batches (assuming I have a gpu in option 2) or do I have to do in a for loop.
L
A
2 comments
Currently in a for loop (and if you use async aquery(), even better (for LLMs that have proper async anyways))

Definitely planning better batch support though
I see, Thanks Logan
Add a reply
Sign up and join the conversation on Discord