Find answers from the community

Updated 3 months ago

Using OpenInferenceCallbackHandler with condense_plus_context chat engine

Has anyone ever used OpenInferenceCallbackHandler with condense_plus_context chat engine? In the log, the response text is the output of my DEFAULT_CONDENSE_PROMPT. However, in the chat menu, I see to the response to the system prompt.

5 comments

WWhiteFang_Jr

You mean you see soemthing different in the log and something else in the response object?

aarda

Yes, as far as I understand, for condense plus context. First uses history and query to extend the prompt with llm call. This is when the OpenInferenceCallbackHandler updates the log.

However, then it uses retrieved nodes and system prompts to generate responses in chat. The output of chat_engine is the response. However the log is condense prompt output

WWhiteFang_Jr

Yes as it logs the condense question part only

WWhiteFang_Jr

You can check this here: https://github.com/run-llama/llama_index/blob/65946eb92419e94a4cec85af671c78e0ed122593/llama-index-core/llama_index/core/chat_engine/condense_plus_context.py#L251

aarda

But I thought OpenInference callback handler does not retrieve query info from the logs. What I referred to as logs was the query data.

Add a reply