Hi Anyone know how the llamaIndex's chat engine works: specifically, does it query the index for each user interaction and then use configured llm to produce a response or does it figure out if the answer to a new user query is contained in the chat history (including any contexts queried from the index previously)?
Got it thanks, this was really helpful. I was using "openai" mode thinking that the index will be queried for context and then openai llm model used to synthesize the response. But look like that is not 100% accurate and maybe I should be using condense_plus_context.
Yea! openai is basically an agent -- it will decide to respond directly, or to use query engine to help it respond π
condense_plus_context will always use the index for information. This will be much faster (less LLM calls), but maybe a little less customizable. Trade-offs with each mode I suppose
Got it. actually I had another question not pertaining to index but regarding UnstructuredElementNodeParser -- It specifically turn off the use of Embedding model is there a reason for that do you know?