Find answers from the community

Updated 3 months ago

Async

Hello,
Is there a way to use using async function for the query index with local embedding ?
L
T
6 comments
Pretty sure you can do await index.aquery(..) in newer versions of llama index.

But note that local embeddings will have fake async (it's CPU bound, so there's no true async, the async embed function is just calling the sync function)
Thanks, is there a reason streaming isn’t supported for async ?
Mmm nah, just tech debt I think
@Logan M, do you reckon it’s something hard to implement if I tried to do a PR ? Otherwise I’m thinking about just using llama index for generating the prompt and then sending it to vLLM/TGI straight away to still have streaming
I think it would be pretty easy. Basically just a matter of copying stream() and calling llm.astream_chat or llm.astream_complete
Okay, I’ll look into it whenever I can, thanks 🙂
Add a reply
Sign up and join the conversation on Discord