is there a quick way I can test that? here's where I run into a little bit of my shortcomings w/ python
like ... I can probably hack something together real quick, but if you have a good idea off the top of your head that'd probably be faster π
oh, I guess I can just say query_engine._aquery_sub = hacked_func
oh, poop... gotta actually extend the class
okie dokie, got that set up and testing now
okay, so where it fails, it doesn't actually enter that function
but the except didn't catch it?
you can see here what I changed
and there's no warning message saying "tool failed to run question" anywhere
Exception is the base class for all error types, right?
okay, using
BaseException
in the except caught it...
03:47:17.100 [DEBUG ] httpcore.connection - connect_tcp.started host='192.168.0.2' port=8000 local_address=None timeout=180.0 socket_options=None
03:47:17.101 [DEBUG ] httpcore.connection - connect_tcp.failed exception=CancelledError()
03:47:17.101 [WARNING ] root - [context] Failed to run What are the benefits of Retatrutide?
03:47:17.101 [ERROR ] root -
apparently I failed to log the error though π
not much more info than we already know from logging the exception
on the upside, that change prevents the crash
on the downside, still not sure why the future is being marked as cancelled
Dang π€ well, at least we made it work
One thing I was maybe thinking of was to use non-local embedings (I.e. openai).
Another option might be to try setting use_async=False
and see if there's some other error...
I think I need to go and read more on how to debug this error properly lol
use_async=False
has been my workaround, but it tends to cause my request to timeout as async returns in under a minute, but without async in can take upwards of 10 minutes
what is your thought on the non-local embeddings, though?
on a small tangent, I've noticed that since Mistral isn't familiar with the term "Retatrutide", it really struggles with spelling it correctly, which leads to it looking up lots of bizarre stuff from the store... I was thinking that it must be an issue with the tokenizer having not seen it enough.
Is that an English word? π
Local embeddings aren't actually async, was thinking running multiple sub queries might be causing a weird race condition in huggingface π€ I mean, I'm not sure how that would happen though either lol
haha, it's the name of a medicine. The bot I'm working on is a QA bot to help with research on different medicines and their interactions... Currently have roughly 1000 research papers ingested into my store and if I ask it a question about something the model was likely trained on (i.e. it has a wiki page.. like albuterol) it nails it, but retatrutide is new and currently in phase 3 trials, so it probably hasn't come across it much, if at all
Ah, I see.. my thought process was to use the vllm module inside llama index directly instead of accessing it through http.. just sorta eliminate that variable
Ah yea, that's a good idea too π except without http you also lose async (so it will run sequentially essentially right?)
Welp, dang. I didn't get that far in my research, lol
ah, I see that acomplete
is marked as Not Implemented... is there any reason why, or just not implemented yet ?
If it was implemented, it could only call complete()
secretly (so like, fake async)
local models cannot be run async, since they are compute-bound (at least if they are running in the same process)
hmmm... that's odd... because I'm serving Mistral 7B through vllm's openai api service and if I run it synchronously it handles a single request and has a throughput of about 60 tokens/sec, but if I run it async, vllm seems to keep just adding them up and then it tops out at about 190 tokens/sec
ahhh, in the same process
I think they're using ray to spawn threads
or some sort of concurrency
hmmm, okie dokie, maybe that's a future endeavor to implement some sort of thread spawning for the llm modules in llama index π
for now, though, I'm following up on a thread on python.org talking about how it's very difficult to track the source of cancelled futures... Guido Van Rossum himself is asking people how they think it should be handled, lol
they pointed to a tool called aiomonitor, though, that can help to monitor and track what's going on in all the futures, so I'm attempting to set that up
I think in real production use-cases, best to let real LLM providers/servers handle that (TGI, vLLM, Ray, etc.)
I hope aiomonitor helps π I would love if this somehow motivates them to fix how hard it is to debug this hahaha
so, aiomonitor appears to have a command called "where terminated" that let's you see the stacktrace that initiated the cancellation
trying to reproduce it now to see if it works like it sounds
I think nest_asyncio throws a wrench in the works
these are in the middle of all the subquestions running simultaneously
it seems that they're all bundled into RequestResponseCycle.run_asgi()
... or maybe that's fastapi?
yeah, fastapi bundles them all up into the same task somehow
using my repro script I can see all the tasks
the task doesn't show up on the terminated list for some reason
apparently there's a bug in anyio
I bet it's a similar thing here... but I'm using 3.11.7
they fixed a bug where they were mixing mechanisms for handling task cancellation
since elasticsearch was occasionally failing and needing to retry, it was causing an unhandled task cancellation in the task group, which caused the whole group to fail
with their fix in place everything works as expected
Is there a reason our poetry.lock
specifies anyio==4.1.0
I wonder? Will have to try updating that
but holy cow... what a dig
my spelling issue was resolved by me prompting it to "identify the proper nouns and spell them out loud for practice" lmao
welp, I was wrong, lol... the spelling is a crap shoot
I'm not exactly sure where to tackle this particular issue... it seems like a fine tune of some sort is in order, but I don't really wanna do a full fine tune on Mistral and then quantize it again
just submitted my application, wish me luck!
happen to have any insight on the spelling issue?
or maybe a thread I can pull on?
Man I really have no idea on that one.... The only thing I can think of is extremely hacky stuff, like replacing the names with something that you can find/replace later on lol
oh! that's actually not a bad idea...
it really stumbles on "retatrutide" and either spells it "retrutide" or "retartrutide", which of course messes up document retrieval from the index
I could just make a little replace function that I drop after the results to globally replace those mis-spellings
so that brute force approach actually works, lol...
I really only need it to be spelled correctly for the index, after which point it can do whatever it wants and I can just correct it again at the end
not elegant, but effective. good idea!
Hello @Logan M I wanna work with you