sweet thanks! also, am i correct in thinking that async LLM calls during query() won't speed things up? since generating and refining a response requires sequential calls to the LLM?e
@yoelk hmmm, the only disadvantage I can think of is async might ping the embedding model a lot in a short amount of time, which can sometimes cause rate limit errors and other things related to load.
Maybe I'm missing something else though - I was kind of surprised it wasn't turned on by default tbh