I'm working on building a callback handler to integrate Langfuse (LLM observability) with Llama Index (See
https://github.com/langfuse/langfuse/issues/188), but I'm running into issues with missing callback events from a chat engine.
I have a basic Llama Index chat engine (chatmode is Context) which queries Pinecone and tries to answer questions based on that context. However, I notice that the chat engine isn't emitting certain callbacks (specifically related to retrieval). I hooked up the debug handler and I pasted the output below. I know for a fact (through print statements) that Pinecone is getting hit when chatting with a chat engine.
query callbacks:
**********
Trace: query
|_CBEventType.QUERY -> 3.858602 seconds
|_CBEventType.RETRIEVE -> 1.956498 seconds
|_CBEventType.EMBEDDING -> 0.702143 seconds
|_CBEventType.SYNTHESIZE -> 1.901565 seconds
|_CBEventType.TEMPLATING -> 5e-05 seconds
|_CBEventType.LLM -> 1.889486 seconds
**********
versus
chat callbacks:
**********
Trace: chat
|_CBEventType.LLM -> 7.187319 seconds
**********
Anyone know why this might be happening?