Find answers from the community

Updated 4 months ago

How does enterprise host local LLM

At a glance

How does enterprise host local LLM models? I know for a fact that you can host local LLM models on your computer using Ollama,

but how does enterprise do that? do they use AWS? need some insights, thank you

8 comments

LLogan M

yea, AWS, or auto-scaling on some K8s platform

LLogan M

probably using vLLM or TGI

LLogan M

or some other custom in-house server

YYj

thank you

bbin4ry_d3struct0r

I've been playing around with llama.cpp and can confirm it works locally too, even without an Internet connection.

I find the idea of local hosting on AWS contradictory. SageMaker and Bedrock are pretty solid.

LLogan M

Why is that contradictory? 👀 In a majority of applications, you'll need to scale it somehow for production usage (i.e. in a k8s cluster, like EKS on AWS).

llama.cpp is cool, but could never support a production app right now (it can only process things sequentially, no dynamic batching, slower than actual CUDA implementations)

bbin4ry_d3struct0r

It's contradictory because people who want to run AI apps locally probably do it out of concerns about the cost of cloud computing, data privacy and security.

Putting your app on Amazon's property seemingly goes against that idea.

LLogan M

Fair enough. AWS was a suggestion because it's easy to spin up. You could of course spin up a local K8s cluster, but you'll also need a decent amount of hardware depending on the traffic you want to handle (but that's the nice thing about k8s, it runs on any server)

Add a reply