Find answers from the community

Updated 11 months ago

I keep running into an issue where

I keep running into an issue where whenever I set use_async=True on for my query engine, I encounter an error of asyncio.exceptions.CancelledError

The relevant portions of the stack seem to be

Plain Text
File "/path/to/lib/python3.11/site-packages/llama_index/query_engine/sub_question_query_engine.py", line 226, in _aquery_subq
    response = await query_engine.aquery(question)
...
  File "/path/to/lib/python3.11/site-packages/llama_index/response_synthesizers/refine.py", line 325, in aget_response
    response = await self._agive_response_single(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/lib/python3.11/site-packages/llama_index/response_synthesizers/refine.py", line 430, in _agive_response_single
    structured_response = await program.acall(
                          ^^^^^^^^^^^^^^^^^^^^
  File "/path/to/lib/python3.11/site-packages/llama_index/response_synthesizers/refine.py", line 79, in acall
    answer = await self._llm.apredict(
...
  File "/path/to/lib/python3.11/site-packages/openai/_base_client.py", line 1536, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/lib/python3.11/site-packages/openai/_base_client.py", line 1315, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/path/to/lib/python3.11/site-packages/openai/_base_client.py", line 1339, in _request
    response = await self._client.send(
L
.
f
98 comments
Yea that makes sense

One idea I had was changing this try/except to catch any exception. I have a feeling one of the sub-queries is failing and we just need to find out why

https://github.com/run-llama/llama_index/blob/cd892a51d31cc3aa47b7adf9db60e88d9ee96188/llama_index/query_engine/sub_question_query_engine.py#L236
is there a quick way I can test that? here's where I run into a little bit of my shortcomings w/ python
like ... I can probably hack something together real quick, but if you have a good idea off the top of your head that'd probably be faster πŸ˜„
oh, I guess I can just say query_engine._aquery_sub = hacked_func
oh, poop... gotta actually extend the class
okie dokie, got that set up and testing now
okay, so where it fails, it doesn't actually enter that function
wait, lies, yes it does
but the except didn't catch it?
you can see here what I changed
and there's no warning message saying "tool failed to run question" anywhere
Exception is the base class for all error types, right?
okay, using BaseException in the except caught it...

Plain Text
03:47:17.100 [DEBUG   ]       httpcore.connection - connect_tcp.started host='192.168.0.2' port=8000 local_address=None timeout=180.0 socket_options=None
03:47:17.101 [DEBUG   ]       httpcore.connection - connect_tcp.failed exception=CancelledError()
03:47:17.101 [WARNING ]                      root - [context] Failed to run What are the benefits of Retatrutide?
03:47:17.101 [ERROR   ]                      root -


apparently I failed to log the error though πŸ˜…
not much more info than we already know from logging the exception
on the upside, that change prevents the crash
on the downside, still not sure why the future is being marked as cancelled
Dang πŸ€” well, at least we made it work

One thing I was maybe thinking of was to use non-local embedings (I.e. openai).

Another option might be to try setting use_async=False and see if there's some other error...
I think I need to go and read more on how to debug this error properly lol
haha, fair enough
use_async=False has been my workaround, but it tends to cause my request to timeout as async returns in under a minute, but without async in can take upwards of 10 minutes
what is your thought on the non-local embeddings, though?
on a small tangent, I've noticed that since Mistral isn't familiar with the term "Retatrutide", it really struggles with spelling it correctly, which leads to it looking up lots of bizarre stuff from the store... I was thinking that it must be an issue with the tokenizer having not seen it enough.
Is that an English word? πŸ˜…
Local embeddings aren't actually async, was thinking running multiple sub queries might be causing a weird race condition in huggingface πŸ€” I mean, I'm not sure how that would happen though either lol
haha, it's the name of a medicine. The bot I'm working on is a QA bot to help with research on different medicines and their interactions... Currently have roughly 1000 research papers ingested into my store and if I ask it a question about something the model was likely trained on (i.e. it has a wiki page.. like albuterol) it nails it, but retatrutide is new and currently in phase 3 trials, so it probably hasn't come across it much, if at all
Ah, I see.. my thought process was to use the vllm module inside llama index directly instead of accessing it through http.. just sorta eliminate that variable
Ah yea, that's a good idea too πŸ‘ except without http you also lose async (so it will run sequentially essentially right?)
Oh, is that right?
Welp, dang. I didn't get that far in my research, lol
ah, I see that acomplete is marked as Not Implemented... is there any reason why, or just not implemented yet ?
If it was implemented, it could only call complete() secretly (so like, fake async)
local models cannot be run async, since they are compute-bound (at least if they are running in the same process)
hmmm... that's odd... because I'm serving Mistral 7B through vllm's openai api service and if I run it synchronously it handles a single request and has a throughput of about 60 tokens/sec, but if I run it async, vllm seems to keep just adding them up and then it tops out at about 190 tokens/sec
ahhh, in the same process
I think they're using ray to spawn threads
or some sort of concurrency
yea exactly πŸ‘
hmmm, okie dokie, maybe that's a future endeavor to implement some sort of thread spawning for the llm modules in llama index πŸ˜‰

for now, though, I'm following up on a thread on python.org talking about how it's very difficult to track the source of cancelled futures... Guido Van Rossum himself is asking people how they think it should be handled, lol

they pointed to a tool called aiomonitor, though, that can help to monitor and track what's going on in all the futures, so I'm attempting to set that up
I think in real production use-cases, best to let real LLM providers/servers handle that (TGI, vLLM, Ray, etc.)
that makes sense
I hope aiomonitor helps πŸ™ I would love if this somehow motivates them to fix how hard it is to debug this hahaha
so, aiomonitor appears to have a command called "where terminated" that let's you see the stacktrace that initiated the cancellation
trying to reproduce it now to see if it works like it sounds
welllllllll poop
I think nest_asyncio throws a wrench in the works
I only see a single task
these are in the middle of all the subquestions running simultaneously
it seems that they're all bundled into RequestResponseCycle.run_asgi()
... or maybe that's fastapi?
yeah, fastapi bundles them all up into the same task somehow
using my repro script I can see all the tasks
whew, this is confusing
the task doesn't show up on the terminated list for some reason
Here we go... a bug thread from redis that is talking about the same sort of thing
https://github.com/redis/redis-py/issues/2633
apparently there's a bug in anyio
I bet it's a similar thing here... but I'm using 3.11.7
@Logan M .... fixed.
upgrading to anyio 4.2.0
they fixed a bug where they were mixing mechanisms for handling task cancellation
since elasticsearch was occasionally failing and needing to retry, it was causing an unhandled task cancellation in the task group, which caused the whole group to fail
with their fix in place everything works as expected
woowwwww hahaha
amazing find!!
Is there a reason our poetry.lock specifies anyio==4.1.0 I wonder? Will have to try updating that
but holy cow... what a dig
my spelling issue was resolved by me prompting it to "identify the proper nouns and spell them out loud for practice" lmao
magic πŸͺ„ 🎩
welp, I was wrong, lol... the spelling is a crap shoot
I'm not exactly sure where to tackle this particular issue... it seems like a fine tune of some sort is in order, but I don't really wanna do a full fine tune on Mistral and then quantize it again
just submitted my application, wish me luck!
Oh nice!! πŸ‘πŸ‘
πŸ˜„ πŸ˜„
happen to have any insight on the spelling issue?
or maybe a thread I can pull on?
Man I really have no idea on that one.... The only thing I can think of is extremely hacky stuff, like replacing the names with something that you can find/replace later on lol
oh! that's actually not a bad idea...
it really stumbles on "retatrutide" and either spells it "retrutide" or "retartrutide", which of course messes up document retrieval from the index
I could just make a little replace function that I drop after the results to globally replace those mis-spellings
so that brute force approach actually works, lol...
I really only need it to be spelled correctly for the index, after which point it can do whatever it wants and I can just correct it again at the end
not elegant, but effective. good idea!
Nice! πŸ’ͺ
Hello @Logan M I wanna work with you
Add a reply
Sign up and join the conversation on Discord