Reduce

At a glance

The community members are discussing how to reduce the number of requests made to the OpenAI API. One community member suggests trying to increase the embedding batch size, as the default is 10. Another community member provides some code examples to set the batch size to 100 and update the service context accordingly. However, the community member notes that the changes are not working as expected. The community members continue to discuss potential solutions, but there is no explicitly marked answer.

Useful resources

MMarco Castellari

how I reduce the number of requests made to OpenAI api?

7 comments

LLogan M

Are you hitting rate limits?

Probably for emebddings. You could try lowering the embedding batch size (default is 10)

https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/usage_pattern.html#batch-size

MMarco Castellari

You mean increase the bach size, no? I need to reduce the number of requests made to openai.

LLogan M

Right whoops, was confusing token limits with rate limits lol

MMarco Castellari

from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding()
service_context = ServiceContext.from_defaults(embed_model=embed_model)

embed_model = OpenAIEmbedding(embed_batch_size=100)

MMarco Castellari

Do i just need to add this to my code or am i forgetting anything? Because its not working

LLogan M

you'll have to set it in the service context after changing the batch size

LLogan M

probably easier to set a global context too

Plain Text

from llama_index import set_global_service_context
embed_model = OpenAIEmbedding(embed_batch_size=100)
service_context = ServiceContext.from_defaults(embed_model=embed_model)

set_global_service_context(service_context)

Add a reply

Find answers from the community

Reduce