Rate limit

YYogesh Kulkarni

But I guess main reason is, though I am using LLM from HuggngFaceHub, via Service Context, it is still searching for OpenAI

File "C:\Users\yoges\anaconda3\envs\langchain\Lib\site-packages\tenacity__init.py", line 382, in call__
result = fn(args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\yoges\anaconda3\envs\langchain\Lib\site-packages\llama_index\embeddings\openai.py", line 150, in get_embeddings data = openai.Embedding.create(input=list_of_text, model=engine, kwargs).data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\yoges\anaconda3\envs\langchain\Lib\site-packages\openai\api_resources\embedding.py", line 33, in create response = super().create(args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yoges\anaconda3\envs\langchain\Lib\site-packages\openai\api_resources\abstract\engine_apiresource.py", line 153, in create response, , api_key = requestor.request(
^^^^^^^^^^^^^^^^^^
File "C:\Users\yoges\anaconda3\envs\langchain\Lib\site-packages\openai\api_requestor.py", line 230, in request
resp, got_stream = self._interpret_response(result, stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yoges\anaconda3\envs\langchain\Lib\site-packages\openai\api_requestor.py", line 624, in _interpret_response
self._interpret_response_line(
File "C:\Users\yoges\anaconda3\envs\langchain\Lib\site-packages\openai\api_requestor.py", line 687, in _interpret_response_line
raise self.handle_error_response(
openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details.
Now the error is being generated in langchain land, but the suggested version does not solve it

8 comments

LLogan M

Looks like a rate limit error now 😅 Do you have payment info on your openai account?

YYogesh Kulkarni

My OpenAI account is exhausted that's why trying HuggingFace LLM

YYogesh Kulkarni

But Langchain does not know about it I think

YYogesh Kulkarni

Should LLamaIndex Service Context pass that infor to Langchain by some means?

LLogan M

Right. You'll still need to set an embed model though (llama index uses two separate models)

You can run a local embed model from huggingface to avoid openai

https://gpt-index.readthedocs.io/en/latest/how_to/customization/embeddings.html#custom-embeddings

LLogan M

The error mentions langchain because we use some basic langchain classes under the hood 🙂

YYogesh Kulkarni

Following seems to work, so two models, one for LLMPredictor and one for embeding (by default its sentence transformer inside)... looks ok? @Logan M
repo_id = "tiiuae/falcon-7b"
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

llm_predictor = LLMPredictor(llm=HuggingFaceHub(repo_id=repo_id,
model_kwargs={"temperature": 0.1, 'truncation': 'only_first',
"max_length": 512}))
service_context = ServiceContext.from_defaults(chunk_size=64, llm_predictor=llm_predictor, embed_model=embed_model)

LLogan M

Looks good to me! (64 chunk size is pretty small though lol but up to you)

Add a reply

Find answers from the community

Rate limit