Find answers from the community

Updated 4 months ago

Hi, my chat_engine (context mode) is

At a glance

The community member is experiencing slowness in their chat engine, even on the first question. They have added a profiler to the code and identified several methods that are taking a significant amount of time. The community members discuss potential ways to improve the efficiency of the chat engine, such as hosting options for local language models or embedding models, and using streaming to make the response feel faster. However, there is no explicitly marked answer in the comments.

Hi, my chat_engine (context mode) is currently quite slow. even on the first question. What are some good practices for increasing the efficiency of my chat engine?
T
L
8 comments
I've added a profiler to the code and these are the methods that take a lot of total time:
{method 'read' of '_ssl._SSLSocket' objects} (37 secs) (13 calls)
sessions.py:671(send) (37 secs) (2 calls)
adapters.py:435(send) (37 secs) (2 calls)
connectionpool.py:533(urlopen) (37 secs) (2 calls)
sessions.py:500(request) (37 secs) (2 calls)
connectionpool.py:378(_make_request) (37 secs) (2 calls)
engine_api_resource.py:127(create) (37 calls) (2 calls)
{method 'readline' of '_io.BufferedReader' objects} (37 secs) (62 calls) <- any ideas to bring this number of calls down?
context.py:147(chat) (37 secs) (1 call)
client.py:311(begin) (37 secs) (2 calls)
api_requestor.py:569(request_raw) (37 secs) ( 2 calls)
ssl.py:1300(recv_into) (37 secs) (13 calls)
socket.py:692(readinto) (37 secs) (13 calls)
Any help would be very much appreciated!
Context chat engine is already about as fast as it gets

Theres a call to embeddings to emebd the user message and retrieve nodes

Then theres a single LLM call to read the retrieved nodes and chat history and respond
If you are using a local LLM or embedding model, you could look into hosting options to speed those up
otherwise, you can use streaming to make it feel faster
Thanks Logan! What do you mean by hosting options? I'm testing the model locally but it seems equally slow when testing on the frontend.
The repo is on bitbucket
Are you using a local LLM? I'm talking about running the LLM on vLLM or text-generatation-inference or something that optimizes LLM inference
Add a reply
Sign up and join the conversation on Discord