Requests

At a glance

The community member is experiencing an issue with the LlamaIndex library, where a single question is generating multiple requests, exceeding the OpenAI rate limit of 3 requests per minute. The comments suggest that this is likely due to the default "agent" chat mode, which involves multiple steps, including deciding which tool to use, querying the index, and interpreting the result.

The community members discuss potential solutions, such as trying different chat modes, adding funds to the OpenAI account, and using a token counting handler to monitor the token usage. They also provide code snippets and explanations about the underlying process, including the retrieval of multiple nodes and the interpretation of the response.

There is no explicitly marked answer in the provided information.

Useful resources

SSeandainya

Why in 1 question, llamaindex makes so many requests? This means I can't get an answer because OpenAI's limit only allows 3 requests in 1 minute

Attachment

16 comments

LLogan M

I'm guessing you just did index.as_chat_engine() ?

By default, this is an agent. Which means 1 LLM call to decide which tool to use (I.e the index), at least one call to query the index, and one call to interpret the result.

Maybe try another chat mode, or add some money to your openai account and set a low $ limit, to get around the rate limits

SSeandainya

Oops maybe because I accidentally called the response twice too 😅

SSeandainya

yes you're right i'm currently using index.as_chat_engine()

SSeandainya

Did anyone know why do the tokens used increase so much in third attempt? Why is the LLM Prompt 2000ish tokens being called twice even though the contents of my document should be the same?

SSeandainya

Attachments

AAneerudh

can you send me the code snippet for getting this token usage stats in llama index?

AAneerudh

@Seandainya

SSeandainya

https://gpt-index.readthedocs.io/en/latest/examples/callbacks/TokenCountingHandler.html#

AAneerudh

thanks a lot

LLogan M

This is the query engine -- it's retrieving two nodes (likel 1024 tokens each) and sending them to the LLM along with the query to answer

SSeandainya

Why only the third attempt retrieving two nodes?

LLogan M

It's not a third attempt actual. The default a chat engine (an agent) has 3 steps I described earlier

Read user message, either generate a response or write an input to a tool (i.e. the tool in this case is a query engine)
Run the tool with the query (i.e. run the query engine, which retrieves + writes response)
Interpret response in context of previous chat history, give user final answer

LLogan M

Definitely check out other chat modes though

LLogan M

https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/usage_pattern.html#available-chat-modes

SSeandainya

Okay then

SSeandainya

Thank you for the information

Add a reply

Find answers from the community

Requests