Hi, my chat_engine (context mode) is

At a glance

The community member is experiencing slowness in their chat engine, even on the first question. They have added a profiler to the code and identified several methods that are taking a significant amount of time. The community members discuss potential ways to improve the efficiency of the chat engine, such as hosting options for local language models or embedding models, and using streaming to make the response feel faster. However, there is no explicitly marked answer in the comments.

TTorsten

Hi, my chat_engine (context mode) is currently quite slow. even on the first question. What are some good practices for increasing the efficiency of my chat engine?

8 comments

TTorsten

I've added a profiler to the code and these are the methods that take a lot of total time:
{method 'read' of '_ssl._SSLSocket' objects} (37 secs) (13 calls)
sessions.py:671(send) (37 secs) (2 calls)
adapters.py:435(send) (37 secs) (2 calls)
connectionpool.py:533(urlopen) (37 secs) (2 calls)
sessions.py:500(request) (37 secs) (2 calls)
connectionpool.py:378(_make_request) (37 secs) (2 calls)
engine_api_resource.py:127(create) (37 calls) (2 calls)
{method 'readline' of '_io.BufferedReader' objects} (37 secs) (62 calls) <- any ideas to bring this number of calls down?
context.py:147(chat) (37 secs) (1 call)
client.py:311(begin) (37 secs) (2 calls)
api_requestor.py:569(request_raw) (37 secs) ( 2 calls)
ssl.py:1300(recv_into) (37 secs) (13 calls)
socket.py:692(readinto) (37 secs) (13 calls)

TTorsten

Any help would be very much appreciated!

LLogan M

Context chat engine is already about as fast as it gets

Theres a call to embeddings to emebd the user message and retrieve nodes

Then theres a single LLM call to read the retrieved nodes and chat history and respond

LLogan M

If you are using a local LLM or embedding model, you could look into hosting options to speed those up

LLogan M

otherwise, you can use streaming to make it feel faster

TTorsten

Thanks Logan! What do you mean by hosting options? I'm testing the model locally but it seems equally slow when testing on the frontend.

TTorsten

The repo is on bitbucket

LLogan M

Are you using a local LLM? I'm talking about running the LLM on vLLM or text-generatation-inference or something that optimizes LLM inference

Add a reply

Find answers from the community

Hi, my chat_engine (context mode) is