Find answers from the community

Updated last year

Local Rag

At a glance

The community member is working on a local RAG chatbot using Guidance, Mistral, and Postgres, and it works fine. They are wondering if it's possible to reuse the language model from Guidance or the LlamaCPP class instead of having to load two instances of the same model. The comments discuss how the community member is using Postgres for memory, creating a table for chats and treating it like a RAG, building a router using Guidance to choose the right store to query, and pulling in messages/chunks as documents. The community member also builds the prompt with the last n messages for context, and has other tables with documents for QA/RAG. Another community member is interested in learning more about the setup, asking if the code is open-source and how long each query takes.

Hi- I am working on a local RAG chatbot using guidance, mistral, and postgres and it works fine. I am wondering though if it's possible to reuse the lm from guidance or from the LlamaCPP class instead of having to have two instances of the same model loaded. Can this be done?
T
s
6 comments
You are using postgres for memory ? Could you please elaborate how you done it iam also working on chatbot with memory
I have created a table for chats and am treating it like any other RAG. I have built a router using guidance that lets the LLM choose the right store to query depending on the message
so basically messages / chunks are being pulled in as documents
and I also build the prompt as normal with last n messages for context
other tables have other documents for QA/RAG
Is your code open source. A repo or article where i can learn. Trying to setup a similar rag chatbot with mistral model. And also how much time it's taking for every query?
Add a reply
Sign up and join the conversation on Discord