Can someone please explain the distinction between chat mode and query mode to me? Initially, I believed the only distinction was that in chat mode, it retains the previous messages, while the underlying process remains the same—context is provided, retrieval is performed using embeddings, and the top k most relevant results are sent to the LLM. However, comparing the outcomes of these two modes reveals differences. Notably, it seems to incorporate a significant amount of out-of-context information, likely sourced from OpenAI's knowledge base, leading to longer responses.