Hi Guys,

At a glance

Hi Guys,

I have already built a solution to create custom AI chatbots from own data which uses gpt-3.5-turbo and RAG to fetch the context using openai embedding and cosine similarity match.

I am facing issues when user asks some follow up question on the previous question, in this case doing RAG for just the question is not sufficent as it will not fetch the required context.

I already tried question rephrasing based on conversation history but it makes responses slow and difficult to decide which one to rephrase and which one to not.

I went through the source code of chat llama_index and couldn't find anything different, follow up questions are not handled is what I observed.

How RAG will work for this scenario?
Q: what is apple watch?
A: bla bla
Q: what are it's features?

I have tried chatbase.co and it works pretty well don't know how they do it. Can someone please help me on this

13 comments

LLogan M

rephrasing (or relying on the LLM to make a new query) is really the only way.

You could retrieve based on the entire chat history, but this is problematic if the user changes topics

LLogan M

From an engineering perspective, I really don't know what a better option is

llancerninja

Ok, Can I embed last 10 messages and still get the right context?

llancerninja

Or I figure some sort of NLP technique or attention to get right context using RAG

LLogan M

I'm not aware of any fancier techniques yet.

Like, picture this chat history

Plain Text

user: What is an apple watch?
A: bla bla
user: What is a samsung fold 4?
A: bla bla
user: What colors does it come in?

How do I retrieve the right context at every step? Without an LLM rephrasing the question, it's nearly impossible 😅

llancerninja

Don't know what magic chatbase does I'm stuck on this problem for a month now, read source code of multiple libraries couldn't find anything.

llancerninja

For each query chatbase does retrieve context so they don't use agent, even for "hi" it retrieves context

llancerninja

But it works so well for follow up questions, until at least 5-6 messages

llancerninja

If LLM has to rephrase how to decide when to do it, that I couldn't figure out

llancerninja

@Logan M Can function calling help in this?

LLogan M

I think it's mostly prompt engineering for deciding when to rephrase

i.e. "Repeat the users message to query, or rephrase based on chat history if needed"

llancerninja

ccalebt

Yeah, the other thing that you can do is feed previously used sources forward in the conversation. Or to create some sort of running summary that helps, but I think your best bet is to use an llm for writing a new query based on history.

Add a reply

Find answers from the community

Hi Guys,