Yes I am doing this. Now I want to use this for server in my rag app so that llm calls will hit this server. Instead I again declaring lllamcpp instance in my rag app
I think this is what I want to try . Right now the response from the chat engine is response is wierd. Like even I say hello , it is responding something like user: some garbage Assistant:some garbage. Any help how to fix it
Not from the hosting mode l, but when I am trying in notebook by loading gguf model from llamaindex.llm LlamaCpp ,it's giving wierd responses . Is there a way I can fix that .
From the hosted model using openailike , I need to try now