ishara3512

·

Local models

Hi, I am using llamaindex with local open source model and the backend was created using FastAPI. The latency for concurrent users was very high and then when tried to increase the number of workers it needed the model to be loaded for the number of workers assigned but I do not have enough gpu memory for that, So how can we do this with llamaindex and FastAPI?

1 comment

L

Find answers from the community

Local models