Has anyone ever used OpenInferenceCallbackHandler with condense_plus_context chat engine? In the log, the response text is the output of my DEFAULT_CONDENSE_PROMPT. However, in the chat menu, I see to the response to the system prompt.
Yes, as far as I understand, for condense plus context. First uses history and query to extend the prompt with llm call. This is when the OpenInferenceCallbackHandler updates the log.
However, then it uses retrieved nodes and system prompts to generate responses in chat. The output of chat_engine is the response. However the log is condense prompt output