Find answers from the community

Updated last year

I also have a few concerns about the

At a glance

I also have a few concerns about the concurrency. I have an app with about 200k active users in a day, about 30k online users in an hour, 1.5k online users in a minute. I want to give users some AI related functionalities backed by llama index+openai api. assume 10% active users will use. then it would be 150 calls in a min, 3k calls in an hour, 20k calls in a day, to llama index. is this even practical? what level of hardwares will be needed? thanks in advance for any suggestions!

9 comments

bbmax

Could be practical, depends on what your llama index app is doing it.

You're saying 20k calls llama-index in a day but llama index will pass those off to open ai, so, the biggest bottle neck / price point will probably be there.

depends on if you're worried about open ai error rate, pricing, db (vector stores) etc?

TTeemu

Yup, OpenAI will be the bottleneck here

aalextraza

thanks @bmax a ton! what about latency? the latency will mostly depends on the openai api as well? llama index itself won't contribute much in the total latency?

bbmax

yep, latency mostly from openai, I have an app that's doing way less traffic and see some calls randomly fail.

bbmax

but if you're not streaming the latency for an openai call is just long in general

bbmax

idk how you'd put that into a webpage/app, some product design there I guess.

aalextraza

thanks! that's very helpful information

bbmax

If you feel comfortable sharing, I’d like to hear about your app.

aalextraza

it is an app designed for sports fans, where users can come together to discuss, interact, and compete with each other. Now, I'm planning to introduce a chatbot into the app to answer some of the common questions from users.

Add a reply