Hello , I have been using Llama index to

Hello , I have been using Llama index to build a RAG Pipeline. My ingested data is pdf's. I am trying to built a chatbot around it. But for a query like : 'Hi' or 'Thank You', it searches the index and returns some context based output. I want to prevent this, i.e., for general queries I want to not retrieve from the index and directly answer using the LLM and thus make it more user friendly and save time. Any suggestions ?

28 comments

WWhiteFang_Jr

Using chat engine with few modes can help you tackle these queries with correct response:
https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/modules.html

But if you want to do it with query engine, I think you will have to use llm to vlaidate user answer whether user is asking a query or greeting and based on that you can choose to go foward and use query engine or return response for greeting

Hey Thanks, I will try this. Anyways I have been using the chat engine for chat history part, So i guess this should work

WWhiteFang_Jr

yeah then try with OpenAI one or condense with context mode , I think there are 1 or 2 more : https://docs.llamaindex.ai/en/stable/examples/chat_engine/chat_engine_openai.html

WWhiteFang_Jr

They allow off the data conversation

I am using a open source LLM, so I am not sure whether I can use openai one.
For condense plus context mode :
This is from Llama Index docs -
This is a multi-step chat mode built on top of a retriever over your data.

For each chat interaction:

First condense a conversation and latest user message to a standalone question

Then build a context for the standalone question from a retriever,

Then pass the context along with prompt and user message to LLM to generate a response.

This approach is simple, and works for questions directly related to the knowledge base and general interactions.

I don't think is condence plus context gives an option to not retrieve from the vector index

WWhiteFang_Jr

It does say that it gives general interaction

Attachment

WWhiteFang_Jr

https://docs.llamaindex.ai/en/stable/examples/chat_engine/chat_engine_condense_plus_context.html

But it still does retrieve from the index

Just sharing an example below :
User Query 1 : Hi
Output 1 : Based on the information provided, it seems that Rt, who can be reached at rt@mit.edu, recommends Physical Therapy for treating joint pain. This therapy can help improve their physical abilities, coordination, balance, and motor skills.
User Query 2 : Hi Thank you for answering
Output 2 : You're welcome! If you have any other questions, feel free to ask and I'll do my best to help. Good luck with your therapy! If you're looking for more information on Physical Therapy, I'd be happy to provide resources and suggestions for activities you can do at home. Let me know if you'd like any specific recommendations.

it picks up stuff from the vector index

WWhiteFang_Jr

True yea, it is retrieving from the index first

WWhiteFang_Jr

Just checked the code

WWhiteFang_Jr

You can try checking with the LLM before passing the query to the bot , see if that works

Hmm, thats my last resort, since it will add latency and compute cost

Have you come accross guardrails ?

WWhiteFang_Jr

I think the implementation for guradrails is for output support in llamaindex

WWhiteFang_Jr

But there is something called nemo-guardrails

WWhiteFang_Jr

https://towardsdatascience.com/nemo-guardrails-the-ultimate-open-source-llm-security-toolkit-0a34648713ef

WWhiteFang_Jr

havent checked it or tried it yet so cant say much on this

Does Nemo Integrate with Llama Index and Llama CPP, considering I am using Mistral Instruct from Llama CPP

Let me check this

For Nemo-guardrails for the knowledge base part, it needs to have all your documents under a kb folder and it currently supports only markdown files.
No use since my docs are in pdf format. Converting them to markdown might result in loss of information

@WhiteFang_Jr , But people are building RAG Supported chatbots, what are they using fir grneral interactions

LLogan M

General interactions use something like an agent on top of your index, or stuff like the context chat engine that was linked earlier.

Retrieving is generally much faster than an LLM call, I wouldn't worry too much about the time cost there

@Logan M , the issue lies in response for general instructions, If I am saying Hi , it should reply back with something like : Hi, How can I help you? Instead of fetching something from the index

Or suppose I ask a question not related to the index like Number of states in America ? , it should know to fetch the LLM instead of the index

LLogan M

That's exactly what an agent does 😅 but it requires some careful configuration to work nicely (prompts, tool descriptions). It's also slower since it involves multiple LLM calls.

That's why the context chat engines exist, much faster

Add a reply

Find answers from the community

Hello , I have been using Llama index to