Hey, I have one question, so can I do batched inference with llama-index LLMs. Lets take two examples
- For APIs like Open AI API
- Direct access of OSS LLMs like Llama
In both of the cases, am I able to do generation in batches (assuming I have a gpu in option 2) or do I have to do in a for loop.