LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

I keep running into an issue where

I keep running into an issue where

At a glance

The community members are discussing an issue where setting use_async=True on a query engine results in an asyncio.exceptions.CancelledError. They have tried various approaches, such as catching any exception, using non-local embeddings, and setting use_async=False as a workaround. However, the root cause of the cancellation is still unclear. The community members suggest using a tool called aiomonitor to monitor and track the futures, as it has a command to show the stack trace that initiated the cancellation. They also discuss the challenges of running local models asynchronously and the potential benefits of using external LLM providers/servers to handle the asynchronous processing.

Useful resources

·

I keep running into an issue where whenever I set use_async=True on for my query engine, I encounter an error of asyncio.exceptions.CancelledError

The relevant portions of the stack seem to be

Plain Text

File "/path/to/lib/python3.11/site-packages/llama_index/query_engine/sub_question_query_engine.py", line 226, in _aquery_subq
    response = await query_engine.aquery(question)
...
  File "/path/to/lib/python3.11/site-packages/llama_index/response_synthesizers/refine.py", line 325, in aget_response
    response = await self._agive_response_single(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/lib/python3.11/site-packages/llama_index/response_synthesizers/refine.py", line 430, in _agive_response_single
    structured_response = await program.acall(
                          ^^^^^^^^^^^^^^^^^^^^
  File "/path/to/lib/python3.11/site-packages/llama_index/response_synthesizers/refine.py", line 79, in acall
    answer = await self._llm.apredict(
...
  File "/path/to/lib/python3.11/site-packages/openai/_base_client.py", line 1536, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/lib/python3.11/site-packages/openai/_base_client.py", line 1315, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/path/to/lib/python3.11/site-packages/openai/_base_client.py", line 1339, in _request
    response = await self._client.send(

L

.

f

98 comments

Yea that makes sense

One idea I had was changing this try/except to catch any exception. I have a feeling one of the sub-queries is failing and we just need to find out why

https://github.com/run-llama/llama_index/blob/cd892a51d31cc3aa47b7adf9db60e88d9ee96188/llama_index/query_engine/sub_question_query_engine.py#L236

ah, gotcha

is there a quick way I can test that? here's where I run into a little bit of my shortcomings w/ python

like ... I can probably hack something together real quick, but if you have a good idea off the top of your head that'd probably be faster 😄

oh, I guess I can just say query_engine._aquery_sub = hacked_func

kk, gonna try

oh, poop... gotta actually extend the class

okie dokie, got that set up and testing now

okay, so where it fails, it doesn't actually enter that function

here's a trace

wait, lies, yes it does

but the except didn't catch it?

you can see here what I changed

Attachment

and there's no warning message saying "tool failed to run question" anywhere

Exception is the base class for all error types, right?

I'm not crazy?

oh, nope.. https://docs.python.org/3.12/library/exceptions.html#exception-hierarchy

that's useful

okay, using BaseException in the except caught it...

Plain Text

03:47:17.100 [DEBUG   ]       httpcore.connection - connect_tcp.started host='192.168.0.2' port=8000 local_address=None timeout=180.0 socket_options=None
03:47:17.101 [DEBUG   ]       httpcore.connection - connect_tcp.failed exception=CancelledError()
03:47:17.101 [WARNING ]                      root - [context] Failed to run What are the benefits of Retatrutide?
03:47:17.101 [ERROR   ]                      root -

apparently I failed to log the error though 😅

not much more info than we already know from logging the exception

on the upside, that change prevents the crash

on the downside, still not sure why the future is being marked as cancelled

Dang 🤔 well, at least we made it work

One thing I was maybe thinking of was to use non-local embedings (I.e. openai).

Another option might be to try setting use_async=False and see if there's some other error...

I think I need to go and read more on how to debug this error properly lol

haha, fair enough

use_async=False has been my workaround, but it tends to cause my request to timeout as async returns in under a minute, but without async in can take upwards of 10 minutes

what is your thought on the non-local embeddings, though?

on a small tangent, I've noticed that since Mistral isn't familiar with the term "Retatrutide", it really struggles with spelling it correctly, which leads to it looking up lots of bizarre stuff from the store... I was thinking that it must be an issue with the tokenizer having not seen it enough.

Is that an English word? 😅

Local embeddings aren't actually async, was thinking running multiple sub queries might be causing a weird race condition in huggingface 🤔 I mean, I'm not sure how that would happen though either lol

haha, it's the name of a medicine. The bot I'm working on is a QA bot to help with research on different medicines and their interactions... Currently have roughly 1000 research papers ingested into my store and if I ask it a question about something the model was likely trained on (i.e. it has a wiki page.. like albuterol) it nails it, but retatrutide is new and currently in phase 3 trials, so it probably hasn't come across it much, if at all

Ah, I see.. my thought process was to use the vllm module inside llama index directly instead of accessing it through http.. just sorta eliminate that variable

Ah yea, that's a good idea too 👍 except without http you also lose async (so it will run sequentially essentially right?)

Oh, is that right?

Welp, dang. I didn't get that far in my research, lol

ah, I see that acomplete is marked as Not Implemented... is there any reason why, or just not implemented yet ?

If it was implemented, it could only call complete() secretly (so like, fake async)

local models cannot be run async, since they are compute-bound (at least if they are running in the same process)

hmmm... that's odd... because I'm serving Mistral 7B through vllm's openai api service and if I run it synchronously it handles a single request and has a throughput of about 60 tokens/sec, but if I run it async, vllm seems to keep just adding them up and then it tops out at about 190 tokens/sec

ahhh, in the same process

I think they're using ray to spawn threads

or some sort of concurrency

yea exactly 👍

hmmm, okie dokie, maybe that's a future endeavor to implement some sort of thread spawning for the llm modules in llama index 😉

for now, though, I'm following up on a thread on python.org talking about how it's very difficult to track the source of cancelled futures... Guido Van Rossum himself is asking people how they think it should be handled, lol

they pointed to a tool called aiomonitor, though, that can help to monitor and track what's going on in all the futures, so I'm attempting to set that up

I think in real production use-cases, best to let real LLM providers/servers handle that (TGI, vLLM, Ray, etc.)

that makes sense

I hope aiomonitor helps 🙏 I would love if this somehow motivates them to fix how hard it is to debug this hahaha

lol, right?

so, aiomonitor appears to have a command called "where terminated" that let's you see the stacktrace that initiated the cancellation

trying to reproduce it now to see if it works like it sounds

welllllllll poop

I think nest_asyncio throws a wrench in the works

I only see a single task

Attachment

these are in the middle of all the subquestions running simultaneously

it seems that they're all bundled into RequestResponseCycle.run_asgi()

... or maybe that's fastapi?

yeah, fastapi bundles them all up into the same task somehow

using my repro script I can see all the tasks

whew, this is confusing

the task doesn't show up on the terminated list for some reason

:cryingskull:

Here we go... a bug thread from redis that is talking about the same sort of thing
https://github.com/redis/redis-py/issues/2633

apparently there's a bug in anyio

https://github.com/redis/redis-py/issues/2633#issuecomment-1481672064

jeeeesh

I bet it's a similar thing here... but I'm using 3.11.7

@Logan M .... fixed.

upgrading to anyio 4.2.0

they fixed a bug where they were mixing mechanisms for handling task cancellation

since elasticsearch was occasionally failing and needing to retry, it was causing an unhandled task cancellation in the task group, which caused the whole group to fail

with their fix in place everything works as expected

woowwwww hahaha

amazing find!!

Is there a reason our poetry.lock specifies anyio==4.1.0 I wonder? Will have to try updating that

haha, not sure

but holy cow... what a dig

anyio version has been updated in the poetry.lock, nice https://github.com/run-llama/llama_index/blob/f9547d972790902c2ab7ef67696afcfa518e47e4/poetry.lock#L168

nice!

my spelling issue was resolved by me prompting it to "identify the proper nouns and spell them out loud for practice" lmao

magic 🪄 🎩

welp, I was wrong, lol... the spelling is a crap shoot

I'm not exactly sure where to tackle this particular issue... it seems like a fine tune of some sort is in order, but I don't really wanna do a full fine tune on Mistral and then quantize it again

just submitted my application, wish me luck!

Oh nice!! 👍👍

😄 😄

happen to have any insight on the spelling issue?

or maybe a thread I can pull on?

Man I really have no idea on that one.... The only thing I can think of is extremely hacky stuff, like replacing the names with something that you can find/replace later on lol

oh! that's actually not a bad idea...

it really stumbles on "retatrutide" and either spells it "retrutide" or "retartrutide", which of course messes up document retrieval from the index

I could just make a little replace function that I drop after the results to globally replace those mis-spellings

so that brute force approach actually works, lol...

I really only need it to be spelled correctly for the index, after which point it can do whatever it wants and I can just correct it again at the end

not elegant, but effective. good idea!

Nice! 💪

Hello @Logan M I wanna work with you

Add a reply

Sign up and join the conversation on Discord