Ah, you only used llama-index for the loader.
Do you know how the chat history is working with this example? Tried to look at the langchain source code but I don't really understand the path it takes to include chat history.
Would take some work to migrate. Ignoring the custom template for now, you could do
import faiss
from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.vector_stores.faiss import FaissVectorStore
loader = SimpleDirectoryReader(directory_manager.sources_dir, recursive=True, exclude_hidden=True)
documents = loader.load_data()
# dimensions of text-ada-embedding-002
d = 1536
faiss_index = faiss.IndexFlatL2(d)
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
chat_engine = index.as_chat_engine(chat_mode="condense_plus_context")
while True:
msg = input(">>: ").strip()
response = chat_engine.chat(msg)
print(str(response))
By default, this is using openai embeddings and gpt-3.5-turbo. Documents get chunked using a
SentenceSplitter
at a chunk size of 1024 tokens.
The chat engine works by re-phrasing the user message into a query using the chat history, retrieves the top-2 most relevant chunks and inserts them into the system prompt, then sends that + the chat history to the LLM to create a response.
There are a few chat modes detailed here
https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/usage_pattern.html#available-chat-modes