Find answers from the community

Home
Members
snackbar
s
snackbar
Offline, last seen 3 months ago
Joined September 25, 2024
How do I avoid rate limit errors when generating openai embeddings in an ingestion pipeline? The retry logic doesn’t seem to work properly as it eventually just fails after 6 tries. How can I track how many tokens are actually being sent in requests to properly rate limit from my app?
3 comments
s
W
I've got a citation query engine that works fine with a synchronous query. When I change it to use aquery, I get this error:
streaming_response = await query_engine.aquery(question)

Error: AsyncStreamingResponse.init() got an unexpected keyword argument 'response_gen'

am I doing something wrong?
2 comments
L
s
s
snackbar
·

Embed

i'm trying to modify the openaiembedding class to properly rate limit... but I'm not sure if this is the right module to modify:
https://github.com/run-llama/llama_index/blob/47ec97fd11776fa701a3e0e5b2865f7eaacb215a/llama-index-integrations/embeddings/llama-index-embeddings-openai/llama_index/embeddings/openai/base.py#L222
I've made changes to this, uninstalled llama-index, and done 'poetry install --with dev' but it doesn't seem to have applied the changes
2 comments
L