Find answers from the community

Updated 3 months ago

Hi, my chat_engine (context mode) is

Hi, my chat_engine (context mode) is currently quite slow. even on the first question. What are some good practices for increasing the efficiency of my chat engine?
T
L
8 comments
I've added a profiler to the code and these are the methods that take a lot of total time:
{method 'read' of '_ssl._SSLSocket' objects} (37 secs) (13 calls)
sessions.py:671(send) (37 secs) (2 calls)
adapters.py:435(send) (37 secs) (2 calls)
connectionpool.py:533(urlopen) (37 secs) (2 calls)
sessions.py:500(request) (37 secs) (2 calls)
connectionpool.py:378(_make_request) (37 secs) (2 calls)
engine_api_resource.py:127(create) (37 calls) (2 calls)
{method 'readline' of '_io.BufferedReader' objects} (37 secs) (62 calls) <- any ideas to bring this number of calls down?
context.py:147(chat) (37 secs) (1 call)
client.py:311(begin) (37 secs) (2 calls)
api_requestor.py:569(request_raw) (37 secs) ( 2 calls)
ssl.py:1300(recv_into) (37 secs) (13 calls)
socket.py:692(readinto) (37 secs) (13 calls)
Any help would be very much appreciated!
Context chat engine is already about as fast as it gets

Theres a call to embeddings to emebd the user message and retrieve nodes

Then theres a single LLM call to read the retrieved nodes and chat history and respond
If you are using a local LLM or embedding model, you could look into hosting options to speed those up
otherwise, you can use streaming to make it feel faster
Thanks Logan! What do you mean by hosting options? I'm testing the model locally but it seems equally slow when testing on the frontend.
The repo is on bitbucket
Are you using a local LLM? I'm talking about running the LLM on vLLM or text-generatation-inference or something that optimizes LLM inference
Add a reply
Sign up and join the conversation on Discord