LlamaIndex

Log inLog into community

Find answers from the community

Updated 2 years ago

Rate limits

Rate limits

At a glance

The community member is experiencing issues with rate limits when using LLM-based indexes, even for small datasets. They have tried various index types, including the GPTKeyword table, but continue to hit the limits. The community members discuss potential solutions, such as improving logging and debugging, and suggest that the issue may be related to running on Colab, which could be causing OpenAI to throttle the requests. They also discuss the differences between various index types, such as GPTSimpleVectorIndex and GPTKeywordTableIndex, and how they work. The community members express a desire to contribute to the project and improve the documentation to make it more approachable.

Useful resources

·

It's baffling. It's a complete blocker, going to turn to Langchain indexes only to try and solve.
Even if I wait, when I try again I still hit the limit, even for a very small number of document that are each very small

L

x

39 comments

What kind of index are you building? It looks like the LLM is the one hitting rate limits 🤔

every index that uses GPT encounters limits with no matter how few documents or tokens. I've tried them all, but I want to use the GPTKeyword table one primarily.

Hmm not sure what to tell ya 🤔 I've personally not encountered any rate errors, and I haven't seen too many complaints in the discord about it. Some people have indexed entire books 😅

I agree llama index could probably have a better mechanism to slow down requests though, for situations like this.

I would appreciate a bit more logging as well, I have logging set to debug, but I wish I could see more details into what's going on

What details are you interested in?

If I get time, I can look at expanding the debug logs. But also any PRs from anyone are extremely welcome 🙏🙏

yes, have been looking into contributing on the github, just posted an issue.

specifically it would be great to have 'timestamp: request made to OpenAI endpoint' 'timestamp: response from OpenAi'
and the ability to have that log during index creation and query time

if that already exists and I haven't enabled it, that would be SO helpful

Ah awesome! Yea that doesn't exist yet as far as I know, but would be easy enough to add 🙏

amazing! yes. right now, whenever I create an index or query, I have no idea how many calls are actually being made.
it's been making me want to just build my own implementation of llama index which is not the rigth next step haha

Hahaha yea that might be a little too much work 😅

One quick option without waiting for a PR is just editing the package yourself.

You can open the installation location on your machine and add whatever print() or logs you want to the source code

Definitely a temp option, but good for debugging

I work entirely on colab these days, so it's a little more annoying to do, but yes that's my next step. literally can't run anything on llama index right now, rate limited ? shadow banned? I can't figure it out

appreciate your help!

what's also weird is I tried with the openai library directly, and those queries go through fine...

But those queries would be one at at a time right? A little slower I suppose.

Since you are running on colab, I wouldn't be surprised if OpenAI just sees a million requests from colab and automatically shadow bans them or something silly 😅

If you are on windows, using WSL works great with llama index, and removes any windows-specific complications that come with python

when I run a single index.query on a keyword table index

how many openai queries will that be? any approximate idea?

but your point on COLAB and OpenAi throttling that. that's actually a GREAT POINT.

With default settings, probably about one LLM query per 4000 tokens

ok well, making a single index.query("does this work?") gets me that rate limit error on an index with 5 documents in it, each no more than 100 tokens :/

oh wait, a query! whoops

With a query, it will send an LLM call for every document (since the documents are small) with a matching keyword + one call to the LLM to get the keywords from the query

So roughly 6 max in your case

You can try index.query("my query", response_mode="compact") to stuff as much as it can into one LLM call

isn't it weird that I'm still getting rate limited from that ?

Very weird! I blame colab at this point lol

kk good to know thanks again.

last question:

I feel like I misunderstand something about llama index.

Once I've built the index. I understand that this takes a bunch of calls and tokens etc.

Then when I run the index.query --
I expected it to something smart like sort the index, take the top N best results (all locally)
THEN make an LLM call to summarize the results, then it puts those into the context + the prompt to get the answer.

How wrong am i ? @Logan M I feel like you're one of two people who knows the answer

So how the query works depends on the index you are using.

If you want it to work like the operation you described, use GPTSimpleVectorIndex

Then, you can write a query like this index.query("my query", similarity_top_k=3, similarity_cutoff=0.3)

This retrieves the top 3 most relevant nodes, and removes any with a similarity less than 0.3

If you use default settings, the top k is 1 and the cutoff is 0 (I.e. uses any node it finds)

gotcha. and keyword table index is very different?

A keyword index will return any text chunk that has the some keywords as the query. So quite different yes.

There's a good page on how each index works over here: https://gpt-index.readthedocs.io/en/latest/guides/index_guide.html

oh trust me

I live on that page

I've spent the past 1 month crawled up inside llamaindex's brain, I'm a data scientist who generally knows what he's doing, and I really struggle with getting llama index + langchain to play nice

I want to help improve the docs as well, I think I have some ideas of getting the overall explanation to be a little more approachable

just need to get my mvp off the ground then I can work on improvments !

oh that would be very awesome! 🙏

will probably make a medium post

once I understand hahaha

The biggest challenge lol

Add a reply

Sign up and join the conversation on Discord