Find answers from the community

Updated last year

Hello I m looking at the context chat

Hello, I'm looking at the context chat engine for a RAG chat application and noticing that only the last user message is embedded to query the db. Can I use a query engine (of which there are many more choices) to get the same effect as a chat engine by manually handling the chat history and such? Is there a good tutorial for this?
Attachment
image.png
1
L
s
M
21 comments
one thing you will run into is token limits by putting that chat history into the query engine. The longer the chat history in the template, the less room for context from the query engine.

But basically you would want to set the text_qa_template and refine_template to have the chat history.

https://docs.llamaindex.ai/en/stable/examples/customization/prompts/chat_prompts.html
Ok, nice so there are some classes to help build chat history into templates, that's helpful. I'd just build a new query engine for each query then...
If token limit is a problem here then it will be a problem in the Context Chat engine too, so 🀷
iirc chat vs query would use different endpoints of the llm, where the chat endpoint on openai, for example, takes a certain structure, whereas the query endpoint would take a single string that self-describes itself as a chat message.

If that's the case, I wonder whether there could be a performance hit.
They do use different endpoints actually πŸ‘€ For example though, GPT-3.5/4 only works with the chat endpoint, so llm.complete() actually calls the chat endpoint

In general though, query engines call llm.complete() and chat engines call llm.chat()
Oh, interesting (you meant they do NOT use different endpoints here, right?)
Thanks @Logan M this is exactly what I needed to break out of the Context Chat engine restrictions. Also Arize Phoenix has better support for the query engine.
Attachment
image.png
Oh interesting. We actually worked a ton with arize to make the integration work well haha

Any feedback on it?
I just started using it today and it's amazing. Stuff that I used to sift the debug logs for is right there, all organized.

  • Didn't work with v839, reinstalled v836 (but didn't investigate).
  • Wish it would show me, in addition to what it already does, the actual LLM endpoint call.
  • I didn't notice if import phoenix had a way of detecting whether it's running already or not, but that would be useful.
  • The ContextChat does not show as much detail as a basic query engine; no chain, retrieve, synthesis, just embedding and llm fields. See screenshot for example.
Attachment
image.png
It's almost the same information, but presented better for the query engine.

Edit: the output is missing, I think?
Also occasional bugs πŸ™‚
Fantastic feedback! The context chat engine probably just needs more callback hooks πŸ‘and yea, likely the odd bug haha

Glad that it's proving useful πŸ™
Also, not having to refresh Phoenix to see the latest trace would be great.
Also, Phoenix seems to eat streaming chat responses.
Hey @skittythecat maintainer of phoenix here. This will be in an update today!
I didn't notice if import phoenix had a way of detecting whether it's running already or not, but that would be useful.
There is! You can always retrieve the active session via px.active_session()
https://docs.arize.com/phoenix/api/session#phoenix.session
@skittythecat Also a maintainer of Phoenix here. Thanks for trying it out, and thanks for the feedback!

Can you tell us more about what you mean for this bullet?

Plain Text
Wish it would show me, in addition to what it already does, the actual LLM endpoint call


Do you have in mind the exact endpoint, request payload, and response?
Yes, so that I can retry it in Postman for example
Great to know, thanks!
Not a primary use case though. Just being able to see the context to the llm is incredibly useful and generally speeds up the design of my llama index system. Saves my eyes from having to peruse detailed debug logs and the trouble of copying these things to text files etc.
Add a reply
Sign up and join the conversation on Discord