Hey guys, I was hoping to upgrade our

At a glance

Hey guys, I was hoping to upgrade our chat engine by using VectorIndexAutoRetriever (so we can filter results in Qdrant with the retriever prior to generating responses). However, I realized it's no longer as simple as index.as_chat_engine() since we now have a retriever instead of an index. Anyone know best practices for combining the two? Or would I have to create my own chat pipeline?

14 comments

JJake

You will need to add some customization to your pipeline. Here is an example that kind of shows what you're trying to do. https://docs.llamaindex.ai/en/stable/examples/retrievers/auto_vs_recursive_retriever/

JJake

But you will still need an index the retriever works along side your index to determine which document(s) in your index fit the criteria of your query.

CChris

Awesome, that's super helpful - thank you Jake!

CChris

After reading it through, it seems like I'll likely have to create a custom chat engine class to use the returned results. So, something like this: https://github.com/run-llama/llama_index/discussions/15117 (but instead of the query transforms, I'd update the pipeline with the retrievers). Does that seem right or maybe I'm missing something?

LLogan M

You can plug any retriever into a context chat engine (which is one of the options that as_chat_engine was doing for you(

LLogan M

Plain Text

from llama_index.core.chat_engine import CondensePlusContextChatEngine

chat_engine = CondensePlusContextChatEngine.from_defaults(retriever, llm=llm, ...)

LLogan M

Or you could build something more from scratch using workflows if you need full customization

CChris

Ahh okay thank you Logan, I hadn't looked into workflows at all yet.

And that's perfect, that's the conclusion I've come to as well. Just so ya'll are aware, the base CondensePlusContextChatEngine returns the following error when run with AzureOpenAI (regardless of what my user query is from tests so far):

BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400, \'innererror\': {\'code\': \'ResponsibleAIPolicyViolation\', \'content_filter_result\': {\'hate\': {\'filtered\': False, \'severity\': \'safe\'}, \'jailbreak\': {\'filtered\': True, \'detected\': True}, \'self_harm\': {\'filtered\': False, \'severity\': \'safe\'}, \'sexual\': {\'filtered\': False, \'severity\': \'safe\'}, \'violence\': {\'filtered\': False, \'severity\': \'safe\'}}}}}')

So I had to overwrite the default condense prompt, but otherwise, it's working like a charm!

CChris

Okay last question, whenever ya'll have the time, do you know if there's a way to also attach something like the Sub Question Query Engine or any method of query decomposition to CondensePlusContextChatEngine? Thank ya'll 🙌

LLogan M

For that, I think you'd need to use a custom retriever to implement that process 🤔

LLogan M

There's no sub-question retriever sadly.

I'm not sure how that would work either actually, since you'd end up retrieving a lot of data. I guess you'd need some last step to filter the chunks

CChris

I think my current plan is to overwrite the chat or _run_c3 functions in the CondensePlusContextChatEngine class to first decompose the query and then recombine as the full context before giving it to the LLM to respond.. Would love to hear if you think this is silly at all 😅

Couple of follow ups to that:

Do you think I'll needa also rewrite these for the async calls if I'm not using async?
Also, why is it called c3? 👀

Either way, thanks for responding so quickly (on a weekend no less!). It's really expedited my whole process today 😄

LLogan M

I think thats a totally reasonable approach!

No need to overwrite async methods if you aren't using them (although if you ever plan to run this on like a fastapi server, then you'll want async)

Its called c3 because the original contributor wanted to call it the C3 chat engine (context condense chat engine, three c's) -- I thought the current name was more descriptive lol

CChris

That's super helpful! It's already running on FastAPI, so I would've lost my mind trying to figure out where I went wrong lol. And that's fun to know hahah, thanks a ton! 😄

Add a reply

Find answers from the community

Hey guys, I was hoping to upgrade our