Hey Thanks, I will try this. Anyways I have been using the chat engine for chat history part, So i guess this should work
They allow off the data conversation
I am using a open source LLM, so I am not sure whether I can use openai one.
For condense plus context mode :
This is from Llama Index docs -
This is a multi-step chat mode built on top of a retriever over your data.
For each chat interaction:
First condense a conversation and latest user message to a standalone question
Then build a context for the standalone question from a retriever,
Then pass the context along with prompt and user message to LLM to generate a response.
This approach is simple, and works for questions directly related to the knowledge base and general interactions.
I don't think is condence plus context gives an option to not retrieve from the vector index
It does say that it gives general interaction
But it still does retrieve from the index
Just sharing an example below :
User Query 1 : Hi
Output 1 : Based on the information provided, it seems that Rt, who can be reached at rt@mit.edu, recommends Physical Therapy for treating joint pain. This therapy can help improve their physical abilities, coordination, balance, and motor skills.
User Query 2 : Hi Thank you for answering
Output 2 : You're welcome! If you have any other questions, feel free to ask and I'll do my best to help. Good luck with your therapy! If you're looking for more information on Physical Therapy, I'd be happy to provide resources and suggestions for activities you can do at home. Let me know if you'd like any specific recommendations.
it picks up stuff from the vector index
True yea, it is retrieving from the index first
You can try checking with the LLM before passing the query to the bot , see if that works
Hmm, thats my last resort, since it will add latency and compute cost
Have you come accross guardrails ?
I think the implementation for guradrails is for output support in llamaindex
But there is something called nemo-guardrails
havent checked it or tried it yet so cant say much on this
Does Nemo Integrate with Llama Index and Llama CPP, considering I am using Mistral Instruct from Llama CPP
For Nemo-guardrails for the knowledge base part, it needs to have all your documents under a kb folder and it currently supports only markdown files.
No use since my docs are in pdf format. Converting them to markdown might result in loss of information
@WhiteFang_Jr , But people are building RAG Supported chatbots, what are they using fir grneral interactions
General interactions use something like an agent on top of your index, or stuff like the context chat engine that was linked earlier.
Retrieving is generally much faster than an LLM call, I wouldn't worry too much about the time cost there
@Logan M , the issue lies in response for general instructions, If I am saying Hi , it should reply back with something like : Hi, How can I help you? Instead of fetching something from the index
Or suppose I ask a question not related to the index like Number of states in America ? , it should know to fetch the LLM instead of the index
That's exactly what an agent does π
but it requires some careful configuration to work nicely (prompts, tool descriptions). It's also slower since it involves multiple LLM calls.
That's why the context chat engines exist, much faster