The community member is asking whether the llama-index library handles the OpenAI API retry in the query engine for the RateLimitError. They note that there is some code in the openai_utils.py file to handle the retry, but they also see that llama-index uses the langchain OpenAI API wrapper, which seems to throw the RateLimitError. The community member is unsure if they should handle the RateLimitError in their own code.
In the comments, another community member questions whether they are actually using the langchain OpenAI wrapper, as they thought all the LLM code was in-house. They also mention that the current retry logic is a bit basic and might retry too quickly to properly handle the rate limit error. They would appreciate a pull request to improve the retry logic.
Another community member mentions that they encountered a similar RateLimitError a few days ago, but did not save the details. They provide a link to an example that may be related.
Finally, a community member suggests that the openai_utils.py file is only used for direct OpenAI queries, and may not be used in the query engine.
There is no explicitly marked answer in the post or comments.
Hi all. Does llama-index handle the OpenAI API retry in query engine for RateLimitError? I see some code in the repo openai_utils.py to handle the retry. But I also see llama-index uses lang chain openai api wrapper which seems throwing RateLimitError out. Should I handle it in my code?