Iam hosting a llamacpp server locally,

At a glance

The community member is hosting a llamacpp server locally and wants to use it for Retrieval Augmented Generation (RAG) implementation to make language model calls. The comments suggest using the llama-cpp-python library to set up an OpenAI-compatible web server, and then using the OpenAILike LLM from the LlamaIndex library to interact with the local server. However, the community member is experiencing some issues with the responses being "weird" or "garbage", and is seeking help to resolve this problem.

Useful resources

TTech explorer

Iam hosting a llamacpp server locally, how can I use this server for Rag implementation to make llm calls ?

12 comments

aandrei

are you doing this? https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#openai-compatible-web-server

TTech explorer

pip install llama-cpp-python[server]
python3 -m llama_cpp.server --model models/7B/llama-model.gguf

Yes I am doing this. Now I want to use this for server in my rag app so that llm calls will hit this server. Instead I again declaring lllamcpp instance in my rag app

aandrei

Okay if that's the case then you should be able to make this work with OpenAILike LLM https://docs.llamaindex.ai/en/stable/api_reference/llms/openai_like.html#openailike

aandrei

you need to set OPENAI_BASE_API to the localhost server you're running llamacpp on

TTech explorer

I think this is what I want to try . Right now the response from the chat engine is response is wierd. Like even I say hello , it is responding something like user: some garbage
Assistant:some garbage. Any help how to fix it

aandrei

hmm there is somthing about formating to the prompts that ymaybe you have to do?

aandrei

Attachment

TTech explorer

Not from the hosting mode l, but when I am trying in notebook by loading gguf model from llamaindex.llm LlamaCpp ,it's giving wierd responses . Is there a way I can fix that .

From the hosted model using openailike , I need to try now

aandrei

Would you mind submitting an issue along with code to replicate the garbage that's being outputted?

aandrei

Will need to investigate more deeply I think to find a potential resolution

TTech explorer

It was working two days back. I will check tweaking any parameters or prompts still not resolved I will submit an issue

aandrei

👍

Add a reply

Find answers from the community

Iam hosting a llamacpp server locally,