The community member is working on a local RAG chatbot using Guidance, Mistral, and Postgres, and it works fine. They are wondering if it's possible to reuse the language model from Guidance or the LlamaCPP class instead of having to load two instances of the same model. The comments discuss how the community member is using Postgres for memory, creating a table for chats and treating it like a RAG, building a router using Guidance to choose the right store to query, and pulling in messages/chunks as documents. The community member also builds the prompt with the last n messages for context, and has other tables with documents for QA/RAG. Another community member is interested in learning more about the setup, asking if the code is open-source and how long each query takes.
Hi- I am working on a local RAG chatbot using guidance, mistral, and postgres and it works fine. I am wondering though if it's possible to reuse the lm from guidance or from the LlamaCPP class instead of having to have two instances of the same model loaded. Can this be done?
I have created a table for chats and am treating it like any other RAG. I have built a router using guidance that lets the LLM choose the right store to query depending on the message
Is your code open source. A repo or article where i can learn. Trying to setup a similar rag chatbot with mistral model. And also how much time it's taking for every query?