The community member is building an agent that decomposes queries about internal documentation. The agent works fine, and the community member is now looking to push it to production. The inputs are going through a FastAPI Gunicorn instance, with Nginx as a reverse proxy. The community member anticipates having quite a few users and simultaneous queries. The community member asks about the best practice to parallelize agents and whether Gunicorn can handle that by specifying the number of workers.
A comment from another community member suggests that the approach the community member is considering is the best way, but each thread/worker/request will need its own instance of the agent.
I am building an agent that is decomposing queries about about internal documentation. The agent works fine, and I am now looking at pushing this to production. All my inputs are going through a FastAPI Gunicorn instance, with Nginx in front as reverse proxy. However I will have quite a few users and can anticipate that there will be simultaneous queries at the same time. What is the best practice to parallelize agents? Is gunicorn doing that by specifying the amount of workers?