Azure

At a glance

The community member is looking for a simple solution to deploy an inference model on GPU in Azure that can scale to zero when not in use and spin-up when the endpoint is called. They assume Kubernetes is the best approach, but it is not easy to work with directly. The community member expects there is an existing framework or solution that makes this process easy.

In the comments, another community member suggests that Azure might already offer something for this, and that Kubernetes is a great choice, though it has a learning curve. The original poster responds that they haven't found any solutions with Azure, aside from possibly Nvidia Triton Inference Server, which they are not certain can scale to zero. The poster mentions they are researching Fermyon and Knative, which are frameworks that sit on top of Kubernetes and might make it easier to manage.

nnickjtay

I'm trying to come up with a set of steps to deploy an inference model on GPU in Azure that can scale to zero when not in use and spin-up when the endpoint is called.

This seems like it would be a common problem, because there are many companies that want a closed-off ChatGPT with RAG to prevent data leakage. Additionally, GPUs are expensive, so it is ideal to pay for only the compute that is used. I am assuming that Kubernetes is the best approach, but Kubernetes is not easy to work with directly for many reasons. I would therefore, expect that there is an existing framework or solution that makes this process easy. Are there some simple solutions to this problem?

2 comments

LLogan M

I'd be surprised if azure didn't already offer something for this

But also, kubernetes is a great choice too (bit of a learning curve, but worth it imo)

nnickjtay

@Logan M thank you, as far as I can tell, there are no solutions with Azure at this moment, aside from maybe Nvidia Triton Inference Server which can be launched on Azure. I'm not certain it can scale to zero, but it offers easier control over setup and scaling, from what I can tell. After some more searching it seems like there are some frameworks that I might be able to launch on Azure which sit on top of Kubernetes that make it easier to manage. Currently researching Fermyon and Knative.

Add a reply

Find answers from the community

Azure