Find answers from the community

Updated last year

Hey I have one question so can I do

Hey, I have one question, so can I do batched inference with llama-index LLMs. Lets take two examples

  1. For APIs like Open AI API
  2. Direct access of OSS LLMs like Llama
In both of the cases, am I able to do generation in batches (assuming I have a gpu in option 2) or do I have to do in a for loop.
L
A
2 comments
Currently in a for loop (and if you use async aquery(), even better (for LLMs that have proper async anyways))

Definitely planning better batch support though
I see, Thanks Logan
Add a reply
Sign up and join the conversation on Discord