Asyncio

At a glance

Hey guys I have an error when using the Sub_query_engine, I suceed to load the program but when getting the answear I see that he have generated me sub_questions but after i get an error from asyncio.run(). Does someone know how to fix that ?

Attachment

14 comments

LLogan M

When you run inside a notebook, it's usually good to run this before any other code

Plain Text

import nest_asyncio
nest_asyncio.apply()

VVaylonn

Oh didnt know that, does this mean that this error souldnt be here in a python program ?

LLogan M

Yup exactly! It's only for notebooks (tldr is that notebooks are already running in an async loop, and you need to allow async nesting)

ttheOldPhilosopher

Hi @Logan M I also used nest_asyncio.apply() but I am getting this error NotImplementedError: Async generation not implemented for this LLM. . This is the code snippet I am using it for QnA

Plain Text

start_time = time.perf_counter()
query_engine = loaded_index.as_query_engine(
    text_qa_template=DEFAULT_TEXT_QA_PROMPT
)

# run each query in parallel
async def async_query(query_engine, questions):
    tasks = [query_engine.aquery(q) for q in questions]
    r = await asyncio.gather(*tasks)
    return r

_ = asyncio.run(async_query(query_engine, query_list))
elapsed_time = time.perf_counter() - start_time

print(f"{elapsed_time:0.3f}s")

LLogan M

You are getting a different error. Async is not implemented for that LLM, which basically means there's no code written to handle this specific scenario 🤔

What LLM are you using?

ttheOldPhilosopher

I am using Flan-t5 large

ttheOldPhilosopher

Using CustomLLM class from langchain

LLogan M

Yea, that does not support async, because huggface does not have async predictions

LLogan M

Langchain would have to implement some wacky stuff to get that to work, I guess they haven't yet 🤔

LLogan M

You could try implementing an async def _acall() function

LLogan M

In the custom LLM class

LLogan M

If you had a method in mind for async prediction...

ttheOldPhilosopher

Actually, my use case is when I am doing QnA 4 to 5 times in same API. I am getting the CUDA memory error. I am using single CustomLLM class for summarization, sentiment and QnA can you tell me a method where I can increase the prediction speed and I will not get a memory error.

ttheOldPhilosopher

I used batch_size when loading the huggingface pipeline but it is not working. Sometimes I have a list of paragraphs for summarization but in llama_index I have to send them one by one and it is same for QnA.

Add a reply

Find answers from the community

Asyncio