Find answers from the community

Updated 3 months ago

How does enterprise host local LLM

How does enterprise host local LLM models? I know for a fact that you can host local LLM models on your computer using Ollama,

but how does enterprise do that? do they use AWS? need some insights, thank you
L
Y
b
8 comments
yea, AWS, or auto-scaling on some K8s platform
probably using vLLM or TGI
or some other custom in-house server
I've been playing around with llama.cpp and can confirm it works locally too, even without an Internet connection.

I find the idea of local hosting on AWS contradictory. SageMaker and Bedrock are pretty solid.
Why is that contradictory? πŸ‘€ In a majority of applications, you'll need to scale it somehow for production usage (i.e. in a k8s cluster, like EKS on AWS).

llama.cpp is cool, but could never support a production app right now (it can only process things sequentially, no dynamic batching, slower than actual CUDA implementations)
It's contradictory because people who want to run AI apps locally probably do it out of concerns about the cost of cloud computing, data privacy and security.

Putting your app on Amazon's property seemingly goes against that idea.
Fair enough. AWS was a suggestion because it's easy to spin up. You could of course spin up a local K8s cluster, but you'll also need a decent amount of hardware depending on the traffic you want to handle (but that's the nice thing about k8s, it runs on any server)
Add a reply
Sign up and join the conversation on Discord