Find answers from the community

Home
Members
ishara3512
i
ishara3512
Offline, last seen 3 months ago
Joined September 25, 2024
Hi, I am using llamaindex with local open source model and the backend was created using FastAPI. The latency for concurrent users was very high and then when tried to increase the number of workers it needed the model to be loaded for the number of workers assigned but I do not have enough gpu memory for that, So how can we do this with llamaindex and FastAPI?
1 comment
L