The community member asked if there is a way to use an async function for querying an index with local embeddings. Another community member responded that in newer versions of the llama index, you can use await index.aquery(..), but noted that local embeddings have "fake async" since they are CPU-bound. The discussion then turned to the lack of support for streaming in the async functions, with one community member suggesting it's just "tech debt" and another offering to look into implementing it as a pull request. The community members agreed that implementing streaming support for the async functions would likely be a relatively easy task.
Pretty sure you can do await index.aquery(..) in newer versions of llama index.
But note that local embeddings will have fake async (it's CPU bound, so there's no true async, the async embed function is just calling the sync function)
@Logan M, do you reckon it’s something hard to implement if I tried to do a PR ? Otherwise I’m thinking about just using llama index for generating the prompt and then sending it to vLLM/TGI straight away to still have streaming