Find answers from the community

Updated 2 months ago

Iam hosting a llamacpp server locally,

Iam hosting a llamacpp server locally, how can I use this server for Rag implementation to make llm calls ?
a
T
12 comments
pip install llama-cpp-python[server]
python3 -m llama_cpp.server --model models/7B/llama-model.gguf

Yes I am doing this. Now I want to use this for server in my rag app so that llm calls will hit this server. Instead I again declaring lllamcpp instance in my rag app
Okay if that's the case then you should be able to make this work with OpenAILike LLM https://docs.llamaindex.ai/en/stable/api_reference/llms/openai_like.html#openailike
you need to set OPENAI_BASE_API to the localhost server you're running llamacpp on
I think this is what I want to try . Right now the response from the chat engine is response is wierd. Like even I say hello , it is responding something like user: some garbage
Assistant:some garbage. Any help how to fix it
hmm there is somthing about formating to the prompts that ymaybe you have to do?
Not from the hosting mode l, but when I am trying in notebook by loading gguf model from llamaindex.llm LlamaCpp ,it's giving wierd responses . Is there a way I can fix that .

From the hosted model using openailike , I need to try now
Would you mind submitting an issue along with code to replicate the garbage that's being outputted?
Will need to investigate more deeply I think to find a potential resolution
It was working two days back. I will check tweaking any parameters or prompts still not resolved I will submit an issue
Add a reply
Sign up and join the conversation on Discord