Find answers from the community

Updated 4 months ago

Hi Anyone know how the llamaIndex's chat

At a glance

Hi Anyone know how the llamaIndex's chat engine works: specifically, does it query the index for each user interaction and then use configured llm to produce a response or does it figure out if the answer to a new user query is contained in the chat history (including any contexts queried from the index previously)?

9 comments

LLogan M

It depends which chat engine you are using. There is a description in our docs here
https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/usage_pattern.html#available-chat-modes

eerizvi

Got it thanks, this was really helpful. I was using "openai" mode thinking that the index will be queried for context and then openai llm model used to synthesize the response. But look like that is not 100% accurate and maybe I should be using condense_plus_context.

eerizvi

Thanks for the quick reply

LLogan M

Yea! openai is basically an agent -- it will decide to respond directly, or to use query engine to help it respond 🙂

condense_plus_context will always use the index for information. This will be much faster (less LLM calls), but maybe a little less customizable. Trade-offs with each mode I suppose

eerizvi

Got it. actually I had another question not pertaining to index but regarding UnstructuredElementNodeParser -- It specifically turn off the use of Embedding model is there a reason for that do you know?

eerizvi

https://github.com/run-llama/llama_index/blob/b20675ae7d3fb2a61f220bb324399f62443624ef/llama_index/node_parser/relational/base_element.py#L133

LLogan M

Since its a summary index, it doesnt use embeddings at all.

Without setting that to None, it will default to initializing openai embeddings, which will raise an error if a user doesn't have an API key set

LLogan M

So, nothing to worry about

eerizvi

Okay got it. thanks for the clarifications.

Add a reply