Hi again just trying to see if anyone

At a glance

Hi again, just trying to see if anyone can offer any advise on how to run a local Llama2 llm on one server and query it from another. Like is there an out of the box way to run a llama-cpp-python[server] on one server and change the endpoints in llama-index to like http:// 192.168.0.19:8000 to query it? I see that I can run on the same machine via LlamaCPP the model but I am looking to have a beefy server running just the llm model and not the querying code. Also I am noticing while running Llama2 via llama-cpp-python[server] that I am getting results much faster than when running from LlamaCPP, just wondering if anyone has noticed this as well. Thanks.

2 comments

LLogan M

No out of the box way at the moment. An LLM integration for llama-cpp-server would be cool though!

In the meantime, the other option is writing your own LLM class that extends the custom LLM class and forwards prompts to your server. It would look very similar to the llama-cpp class I'm guessing?

RRob_P

Ok thanks Logan, I will take a stab at writing a custom LLM for this purpose.

Add a reply

Find answers from the community

Hi again just trying to see if anyone