Find answers from the community

Updated 4 months ago

Async

At a glance

The community member asked if there is a way to use an async function for querying an index with local embeddings. Another community member responded that in newer versions of the llama index, you can use await index.aquery(..), but noted that local embeddings have "fake async" since they are CPU-bound. The discussion then turned to the lack of support for streaming in the async functions, with one community member suggesting it's just "tech debt" and another offering to look into implementing it as a pull request. The community members agreed that implementing streaming support for the async functions would likely be a relatively easy task.

Hello,
Is there a way to use using async function for the query index with local embedding ?
L
T
6 comments
Pretty sure you can do await index.aquery(..) in newer versions of llama index.

But note that local embeddings will have fake async (it's CPU bound, so there's no true async, the async embed function is just calling the sync function)
Thanks, is there a reason streaming isn’t supported for async ?
Mmm nah, just tech debt I think
@Logan M, do you reckon it’s something hard to implement if I tried to do a PR ? Otherwise I’m thinking about just using llama index for generating the prompt and then sending it to vLLM/TGI straight away to still have streaming
I think it would be pretty easy. Basically just a matter of copying stream() and calling llm.astream_chat or llm.astream_complete
Okay, I’ll look into it whenever I can, thanks 🙂
Add a reply
Sign up and join the conversation on Discord