The community member is experiencing slowness in their chat engine, even on the first question. They have added a profiler to the code and identified several methods that are taking a significant amount of time. The community members discuss potential ways to improve the efficiency of the chat engine, such as hosting options for local language models or embedding models, and using streaming to make the response feel faster. However, there is no explicitly marked answer in the comments.
Hi, my chat_engine (context mode) is currently quite slow. even on the first question. What are some good practices for increasing the efficiency of my chat engine?