Find answers from the community

Updated last year

Hey I have one question so can I do

At a glance

The community member is asking if they can do batched inference with LLMs, specifically for APIs like OpenAI API and direct access to open-source LLMs like Llama. Another community member responds that currently, the inference is done in a for loop, but they are planning to add better batch support. The original poster acknowledges the response.

AAnindya

Hey, I have one question, so can I do batched inference with llama-index LLMs. Lets take two examples

For APIs like Open AI API
Direct access of OSS LLMs like Llama

In both of the cases, am I able to do generation in batches (assuming I have a gpu in option 2) or do I have to do in a for loop.

2 comments

LLogan M

Currently in a for loop (and if you use async aquery(), even better (for LLMs that have proper async anyways))

Definitely planning better batch support though

AAnindya

I see, Thanks Logan

Add a reply