What code did you run to hit this? You'll likely need to add some sleep
to process the data a little more slowly?
I'm using the summary index in tree summarize mode associated with router query engine
In a condense chat engine
How to proceed in my case, if it is the best practice here ?
ah, and probably creating the summary index is causing this error?
not creating but querying it
Might have to set use_async=False
-- although tbh we should probably be better at handling this.
It seems like a document this big is causing too many API requests at a given time π
According to the trace, it's during the use of synthesizer that the error happens, After many retries
Yeah it's linked with using too much token too fast
Without async, it's kind of wild that it's using that much in a minute though
it happens on both gpt 3.5 turbo (0611) and gpt 4 turbo preview
Did you set max_tokens
in the LLM definition?
this is what happens multiple times after some calls
Kind of weird there is so many retries. Default is 3 retries
Maybe try decreasing max_tokens
to avoid the rate limit
Or, you can set max_retries
in the LLM to be higher (default is 3)
no it's just a logging bug
there is only one in reality
max token is only to 2048
Right, but it's requesting 2048 tokens so many times in a minute that it's hitting the rate limit
(i think, anyways, that's my hypothesis haha)
Yeah will try these solutions, thanks for being always here !
Actually, a better solution might be keeping max_tokens at 2048 (I'm assuming you were trying to avoid cut-off responses), and instead, artificially lower the context window
service_context = ServiceContext.from_defaults(...., context_window=10000)
gpt-3.5-turbo-1106
has a 16K context window.
tree_summarize
is nearly stuffing the context window in every LLM call (the error above said it requested like 12k tokens)
So if we artficially lower the context window, each request will consume less tokens, and hopefully stay under the token limit
it seems to help waiting more
I am sorry, but I cannot provide a full summary of the thesis as this would involve dealing with a large volume of information and specific details. However, I can help you summarise specific parts of the thesis or answer questions on particular topics covered in it. Feel free to ask more targeted questions or request summaries of specific sections.
What was they query? Maybe we can modify slightly to not think of it as summarizing an entire thesis? query_engine.query("Highlight the important details from the provided text")
The query was "generate a complete summary fo the thesis" so yeah maybe it is too much
seems to work better but the summary is very short
hmm yea, might need some prompt tweaks
And lol with gpt 4 preview I have the error even with the first fix
trying to put the max retries to 100 lol
the new gpt-4 has 128K context window right? Then you probably really need to artificially shrink it
so like you explained here ?
it could hit your 60,000k per minute limit in one LLM call π
I thought for my poor tier1 plan
And just saw gpt-3.5-turbo-instruct is
250β―000 TPM
and just saw it was also 500k TPD, so limit always reached
(in this very very specific case)