Pretty sure you can do await index.aquery(..) in newer versions of llama index.
But note that local embeddings will have fake async (it's CPU bound, so there's no true async, the async embed function is just calling the sync function)
@Logan M, do you reckon it’s something hard to implement if I tried to do a PR ? Otherwise I’m thinking about just using llama index for generating the prompt and then sending it to vLLM/TGI straight away to still have streaming