Find answers from the community

Updated 2 years ago

Hey guys I m building a

At a glance

The community member is building a GPTSimpleVectorIndex using the langchain CohereEmbeddings embed_model, but the index creation process spams the Cohere API without rate limiting, causing the API to refuse requests. The community members discuss potential solutions, such as adding the documents one by one with a delay, and contributing a rate limiting feature to the library. However, there is no explicitly marked answer in the comments.

Hey guys, I'm building a GPTSimpleVectorIndex over a larger corpus of text using the langchain CohereEmbeddings embed_model.
When the index is created, it just spams the Cohere API without rate limit, which at some point just refuses to take requests from me.
Is there a way to rate-limit requests, or if not, would it be something worth contributing?
e
j
4 comments
Well I guess I can simply add the documents one by one with somethng like:

Plain Text
import time

documents = SimpleDirectoryReader(...)
index = GPTSimpleVectorIndex([], embed_model=embed_model)

for d in documents:
    index.insert(d)
    time.sleep(0.1)
@ephe_meral we don't have rate limiting handling for Cohere yet, but we do for openAI. If you do want to add a contribution you can take a look at the usage of retry_on_exceptions_with_backoff in LLMPredictor in gpt_index.langchain_helpers.chain_wrapper.py!
Not sure where to best put this, though. To reduce code duplication, I guess LangchainEmbedding could make sense? However, I may need to import the cohere.error.CohereError specifically, if we don't want to just catch Exception - but we don't currently have dependencies to cohere AFAIK.
Ah I see...yeah that's a fair point. i might need to think a bit more about this
Add a reply
Sign up and join the conversation on Discord