Find answers from the community

Updated last year

I also have a few concerns about the

I also have a few concerns about the concurrency. I have an app with about 200k active users in a day, about 30k online users in an hour, 1.5k online users in a minute. I want to give users some AI related functionalities backed by llama index+openai api. assume 10% active users will use. then it would be 150 calls in a min, 3k calls in an hour, 20k calls in a day, to llama index. is this even practical? what level of hardwares will be needed? thanks in advance for any suggestions!
b
T
a
9 comments
Could be practical, depends on what your llama index app is doing it.

You're saying 20k calls llama-index in a day but llama index will pass those off to open ai, so, the biggest bottle neck / price point will probably be there.

depends on if you're worried about open ai error rate, pricing, db (vector stores) etc?
Yup, OpenAI will be the bottleneck here
thanks @bmax a ton! what about latency? the latency will mostly depends on the openai api as well? llama index itself won't contribute much in the total latency?
yep, latency mostly from openai, I have an app that's doing way less traffic and see some calls randomly fail.
but if you're not streaming the latency for an openai call is just long in general
idk how you'd put that into a webpage/app, some product design there I guess.
thanks! that's very helpful information
If you feel comfortable sharing, I’d like to hear about your app.
it is an app designed for sports fans, where users can come together to discuss, interact, and compete with each other. Now, I'm planning to introduce a chatbot into the app to answer some of the common questions from users.
Add a reply
Sign up and join the conversation on Discord