Openai rate limit

CCharlesWave

Hi Logan, thanks again for replying my question! I'm on free trial but I have $15 remaining. Does this error message suggest that my remaining balance is far from what is required?

25 comments

LLogan M

Nope! But the free trials seem to have rate limits (which is very annoying), seems like its limited to 60 requests a minute

CCharlesWave

Okay but I've definitely waited for more than 1 minute before rerunning the code but this error still comes out though.

LLogan M

What type of index is it?

CCharlesWave

list index. I'm still working on the review dataset. Thanks for helping me figure out the "float" error yesterday!

LLogan M

yea no worries!

With a list index, when you specify mode="embedding", it first calculates the embedding for every chunk, and then calls the LLM for every topk chunk (default is 1)

So it's making at least num_chunks calls to the LLM (plus however many batched embedding requests it ends up being)

If your CSV had quite a few rows, this will probanlg trigger that timeout pretty quickly I think 🤔

Maybe try increasing the embed batch size

Plain Text

from llama_index.embeddings.openai import OpenAIEmbedding

...

embed_model = OpenAIEmbedding(embed_batch_size=2000)

...

service_context = ServiceContext.from_defaults(..., embed_model=embed_model)

LLogan M

That should reduce the number of api calls quite a bit lol (the default batch size is 10)

LLogan M

Hopefully I spelled all that correctly lol typing on my phone

CCharlesWave

Yeah the spelling is correct! But the error message still came out. 😫

Actually I also tried to build an index on only a subset of reviews (only reviews about a particularly insurance company). And I didn't set the mode, leaving it pretty simple. However the error still came out as well. Except this time it suggests the limit is 3/min whereas the original error suggests the limit is 60/min.

Attachment

LLogan M

oh no lol they are decreasing your limit LOL

LLogan M

Not sure tbh. Other than putting your payment info on your account 🤔

CCharlesWave

I added payment and now the code can run! But I found when I set the mode to be "embedding". The answer is very short, and sometimes is not correct. (I ask to list out 3 reviews out of 10 that mentioned price related topics and it can only found 1).

After I deleted the "mode" parameter, the code has ran for like 5 minutes and it is still running..

Wondering if you have any suggestions on how to balance code running time and accuracy? Thanks a lot Logan!

LLogan M

Progress!

So without the mode, the list index will check every node in the index (which can be slow!). This is usually good for when you need to generate a summary (with response_mode="tree_summarize"

Maybe you'd have better luck with GPTSimpleVectorIndex

By default it computes the embeddings at index construction, and then at query time fetches the top k closest chunks.

Try a query like this maybe response = index.query(..., similarity_top_k=3, response_mode="compact")

You'll probably also get better results if you set the chunk_size_limit in the service context. I would try 1024 as a starting point (this helps reduce LLM token usage, and usually makes the embeddings better, the default chunk size is 3900 which is a little big)

CCharlesWave

Got it! Let me try that! Thank you so much! Also wondering what does "max_chunk_overlap" overlap do?

LLogan M

During query time, if the text retrieved doesn't fit into a single LLM call (due to token limits), it is split into overlapping chunks, according to that value from the prompt helper 💪

Then an answer is refined across all the chunks

CCharlesWave

Ohhh I see.

CCharlesWave

I changed the index to SimpleVectorIndex but it still can only found one review that is about price.... But at least now it doesn't run forever lol.

Just trying to see if my setup is correct:

I iterated each review and apply "Document()" on them. Eventually I have a "documents" which have many Document objects, one for each review. I just then created the index by loading this "documents".

In this case, I'm curious will AI look over one review at a time and fetch the ones that mentioned price? If so, then does the parameter related to chunk will be limiting some lengthy reviews to be checked by AI?

Attachments

LLogan M

Ah, so if each document is a review, your query only fetched one review.

Try increasing that (and probably quite a bit, since the reviews are short)

index.query(..., similarity_top_k=10, response_mode="compact")

This will fetch the 10 closest reviews that match the query. Then, the compact mode stuffs as much review text into each llm call as the context, rather than making one call per top k node

After it gets an answer from the first llm call, if there is still more text to send, it sends the existing answer + query + new context, and asks the llm to update the answer

CCharlesWave

Oh magic! Now it works much much better! 🤩 Much more reviews are returned. Thank you so much Logan!

One more question on optimizing this: I assumed if I set the num_output to be 256, it wouldn't return those lengthy reviews. Therefore I set the num_output to 4096 to see if I can see some longer reviews being returned, but this error came out

Attachment

LLogan M

Glad it works, nice!!

num output is related to how room the LLM has to generate. If you set max_tokens in the ChatOpenAI definition to something like 512, that means the LLM can generate up to 512 tokens (the default is 256).

For models like openai, the input and output are connected. Tokens are generated one at time. Each generated token is added to the input before generating the next token. num_output ensures there is room for this to happen.

max_tokens and num_output always be set to the same value 👍

CCharlesWave

okay but I found there is no parameter called "max_token" in prompt helper. Right now I set the max_input_size and the num_output to be the same and it returned this error

Attachment

LLogan M

Right, max_tokens is set in the ChatOpenAI/OpenAI definition 👍(oh another good catch, usually you'll want do from langchain.chat_models import ChatOpenAi instead of OpenAI for gpt-3.5)

max_input_size can stay at 4096

LLogan M

For reference: https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-fine-grained-control-over-all-parameters

CCharlesWave

Got it! I can't thank you enough Logan!

Super weird finding: I asked the model to return the longest review that mention anything related to price, and it cannot return this review and returned something much shorter, which I thought is because of token limit related reason.

However, when I asked it to return reviews related to price increase, it then returned this one.

I guess this is where prompt engineering comes as necessary lol.

Attachment

LLogan M

Heh definitely 😅

CCharlesWave

Actually I found import OpenAI gives me better response. But anyway thanks for the notice!

Add a reply

Find answers from the community

Openai rate limit