Find answers from the community

Updated last year

Asyncio

Hey guys I have an error when using the Sub_query_engine, I suceed to load the program but when getting the answear I see that he have generated me sub_questions but after i get an error from asyncio.run(). Does someone know how to fix that ?
Attachment
image.png
L
V
t
14 comments
When you run inside a notebook, it's usually good to run this before any other code

Plain Text
import nest_asyncio
nest_asyncio.apply()
Oh didnt know that, does this mean that this error souldnt be here in a python program ?
Yup exactly! It's only for notebooks (tldr is that notebooks are already running in an async loop, and you need to allow async nesting)
Hi @Logan M I also used nest_asyncio.apply() but I am getting this error NotImplementedError: Async generation not implemented for this LLM. . This is the code snippet I am using it for QnA
Plain Text
start_time = time.perf_counter()
query_engine = loaded_index.as_query_engine(
    text_qa_template=DEFAULT_TEXT_QA_PROMPT
)

# run each query in parallel
async def async_query(query_engine, questions):
    tasks = [query_engine.aquery(q) for q in questions]
    r = await asyncio.gather(*tasks)
    return r

_ = asyncio.run(async_query(query_engine, query_list))
elapsed_time = time.perf_counter() - start_time

print(f"{elapsed_time:0.3f}s")
You are getting a different error. Async is not implemented for that LLM, which basically means there's no code written to handle this specific scenario πŸ€”

What LLM are you using?
I am using Flan-t5 large
Using CustomLLM class from langchain
Yea, that does not support async, because huggface does not have async predictions
Langchain would have to implement some wacky stuff to get that to work, I guess they haven't yet πŸ€”
You could try implementing an async def _acall() function
In the custom LLM class
If you had a method in mind for async prediction...
Actually, my use case is when I am doing QnA 4 to 5 times in same API. I am getting the CUDA memory error. I am using single CustomLLM class for summarization, sentiment and QnA can you tell me a method where I can increase the prediction speed and I will not get a memory error.
I used batch_size when loading the huggingface pipeline but it is not working. Sometimes I have a list of paragraphs for summarization but in llama_index I have to send them one by one and it is same for QnA.
Add a reply
Sign up and join the conversation on Discord