Hello I m looking at the context chat

At a glance

The community member is looking to use a query engine instead of a context chat engine for a RAG chat application, as the context chat engine only embeds the last user message to query the database. The community members discuss the potential issues with token limits when including the full chat history in the query engine, and suggest using templates to handle the chat history. They also discuss the differences between chat and query endpoints in large language models, and the community member notes that the Arize Phoenix tool provides better support for the query engine compared to the context chat engine. The community members provide feedback on the Arize Phoenix tool, including issues with bugs, missing features, and the need to see the actual LLM endpoint calls.

Useful resources

sskittythecat

Hello, I'm looking at the context chat engine for a RAG chat application and noticing that only the last user message is embedded to query the db. Can I use a query engine (of which there are many more choices) to get the same effect as a chat engine by manually handling the chat history and such? Is there a good tutorial for this?

Attachment

21 comments

LLogan M

one thing you will run into is token limits by putting that chat history into the query engine. The longer the chat history in the template, the less room for context from the query engine.

But basically you would want to set the text_qa_template and refine_template to have the chat history.

https://docs.llamaindex.ai/en/stable/examples/customization/prompts/chat_prompts.html

sskittythecat

Ok, nice so there are some classes to help build chat history into templates, that's helpful. I'd just build a new query engine for each query then...

sskittythecat

If token limit is a problem here then it will be a problem in the Context Chat engine too, so 🤷

sskittythecat

iirc chat vs query would use different endpoints of the llm, where the chat endpoint on openai, for example, takes a certain structure, whereas the query endpoint would take a single string that self-describes itself as a chat message.

If that's the case, I wonder whether there could be a performance hit.

LLogan M

They do use different endpoints actually 👀 For example though, GPT-3.5/4 only works with the chat endpoint, so llm.complete() actually calls the chat endpoint

In general though, query engines call llm.complete() and chat engines call llm.chat()

sskittythecat

Oh, interesting (you meant they do NOT use different endpoints here, right?)

sskittythecat

Thanks @Logan M this is exactly what I needed to break out of the Context Chat engine restrictions. Also Arize Phoenix has better support for the query engine.

Attachment

LLogan M

Oh interesting. We actually worked a ton with arize to make the integration work well haha

Any feedback on it?

sskittythecat

I just started using it today and it's amazing. Stuff that I used to sift the debug logs for is right there, all organized.

Didn't work with v839, reinstalled v836 (but didn't investigate).
Wish it would show me, in addition to what it already does, the actual LLM endpoint call.
I didn't notice if import phoenix had a way of detecting whether it's running already or not, but that would be useful.
The ContextChat does not show as much detail as a basic query engine; no chain, retrieve, synthesis, just embedding and llm fields. See screenshot for example.

Attachment

sskittythecat

It's almost the same information, but presented better for the query engine.

Edit: the output is missing, I think?

sskittythecat

Also occasional bugs 🙂

sskittythecat

Attachment

LLogan M

Fantastic feedback! The context chat engine probably just needs more callback hooks 👍and yea, likely the odd bug haha

Glad that it's proving useful 🙏

sskittythecat

Also, not having to refresh Phoenix to see the latest trace would be great.

sskittythecat

Also, Phoenix seems to eat streaming chat responses.

MMikyo

Hey @skittythecat maintainer of phoenix here. This will be in an update today!

MMikyo

I didn't notice if import phoenix had a way of detecting whether it's running already or not, but that would be useful.

There is! You can always retrieve the active session via px.active_session()
https://docs.arize.com/phoenix/api/session#phoenix.session

aaxiomofjoy

@skittythecat Also a maintainer of Phoenix here. Thanks for trying it out, and thanks for the feedback!

Can you tell us more about what you mean for this bullet?

Plain Text

Wish it would show me, in addition to what it already does, the actual LLM endpoint call

Do you have in mind the exact endpoint, request payload, and response?

sskittythecat

Yes, so that I can retry it in Postman for example

aaxiomofjoy

Great to know, thanks!

sskittythecat

Not a primary use case though. Just being able to see the context to the llm is incredibly useful and generally speeds up the design of my llama index system. Saves my eyes from having to peruse detailed debug logs and the trouble of copying these things to text files etc.

Add a reply

Find answers from the community

Hello I m looking at the context chat