snackbar

Log inLog into community

Find answers from the community

Home

Members

snackbar

Offline, last seen 6 months ago

Joined September 25, 2024

ssnackbar

How do I avoid rate limit errors when

How do I avoid rate limit errors when generating openai embeddings in an ingestion pipeline? The retry logic doesn’t seem to work properly as it eventually just fails after 6 tries. How can I track how many tokens are actually being sent in requests to properly rate limit from my app?

3 comments

ssnackbar

I've got a citation query engine that

I've got a citation query engine that works fine with a synchronous query. When I change it to use aquery, I get this error:
streaming_response = await query_engine.aquery(question)

Error: AsyncStreamingResponse.init() got an unexpected keyword argument 'response_gen'

am I doing something wrong?

2 comments

ssnackbar

Embed

i'm trying to modify the openaiembedding class to properly rate limit... but I'm not sure if this is the right module to modify:
https://github.com/run-llama/llama_index/blob/47ec97fd11776fa701a3e0e5b2865f7eaacb215a/llama-index-integrations/embeddings/llama-index-embeddings-openai/llama_index/embeddings/openai/base.py#L222
I've made changes to this, uninstalled llama-index, and done 'poetry install --with dev' but it doesn't seem to have applied the changes

2 comments