The community member is hosting a llamacpp server locally and wants to use it for Retrieval Augmented Generation (RAG) implementation to make language model calls. The comments suggest using the llama-cpp-python library to set up an OpenAI-compatible web server, and then using the OpenAILike LLM from the LlamaIndex library to interact with the local server. However, the community member is experiencing some issues with the responses being "weird" or "garbage", and is seeking help to resolve this problem.
Yes I am doing this. Now I want to use this for server in my rag app so that llm calls will hit this server. Instead I again declaring lllamcpp instance in my rag app
I think this is what I want to try . Right now the response from the chat engine is response is wierd. Like even I say hello , it is responding something like user: some garbage Assistant:some garbage. Any help how to fix it
Not from the hosting mode l, but when I am trying in notebook by loading gguf model from llamaindex.llm LlamaCpp ,it's giving wierd responses . Is there a way I can fix that .
From the hosted model using openailike , I need to try now