Hi everyone, I am playing around with the Claude model using Bedrock in Llamaindex like this command: Bedrock(model="anthropic.claude-v2"). When I check the generated prompt using token_counter.llm_token_counts , I notice that there are two duplicate prompt events. Any one has any idea why we send two back to back same prompt calls to LLM?
Can someone please explain the distinction between chat mode and query mode to me? Initially, I believed the only distinction was that in chat mode, it retains the previous messages, while the underlying process remains the same—context is provided, retrieval is performed using embeddings, and the top k most relevant results are sent to the LLM. However, comparing the outcomes of these two modes reveals differences. Notably, it seems to incorporate a significant amount of out-of-context information, likely sourced from OpenAI's knowledge base, leading to longer responses.