Async

At a glance

The community member asked if there is a way to use an async function for querying an index with local embeddings. Another community member responded that in newer versions of the llama index, you can use await index.aquery(..), but noted that local embeddings have "fake async" since they are CPU-bound. The discussion then turned to the lack of support for streaming in the async functions, with one community member suggesting it's just "tech debt" and another offering to look into implementing it as a pull request. The community members agreed that implementing streaming support for the async functions would likely be a relatively easy task.

TThomas1234

Hello,
Is there a way to use using async function for the query index with local embedding ?

6 comments

LLogan M

Pretty sure you can do await index.aquery(..) in newer versions of llama index.

But note that local embeddings will have fake async (it's CPU bound, so there's no true async, the async embed function is just calling the sync function)

TThomas1234

Thanks, is there a reason streaming isn’t supported for async ?

LLogan M

Mmm nah, just tech debt I think

TThomas1234

@Logan M, do you reckon it’s something hard to implement if I tried to do a PR ? Otherwise I’m thinking about just using llama index for generating the prompt and then sending it to vLLM/TGI straight away to still have streaming

LLogan M

I think it would be pretty easy. Basically just a matter of copying stream() and calling llm.astream_chat or llm.astream_complete

TThomas1234

Okay, I’ll look into it whenever I can, thanks 🙂

Add a reply

Find answers from the community

Async