Log in
Log into community
Find answers from the community
View all posts
Related posts
Did this answer your question?
๐
๐
๐
Powered by
Hall
Inactive
Updated 3 months ago
0
Follow
Hi, my chat_engine (context mode) is
Hi, my chat_engine (context mode) is
Inactive
0
Follow
T
Torsten
last year
ยท
Hi, my chat_engine (context mode) is currently quite slow. even on the first question. What are some good practices for increasing the efficiency of my chat engine?
T
L
8 comments
Share
Open in Discord
T
Torsten
last year
I've added a profiler to the code and these are the methods that take a lot of total time:
{method 'read' of '_ssl._SSLSocket' objects} (37 secs) (13 calls)
sessions.py:671(send) (37 secs) (2 calls)
adapters.py:435(send) (37 secs) (2 calls)
connectionpool.py:533(urlopen) (37 secs) (2 calls)
sessions.py:500(request) (37 secs) (2 calls)
connectionpool.py:378(_make_request) (37 secs) (2 calls)
engine_api_resource.py:127(create) (37 calls) (2 calls)
{method 'readline' of '_io.BufferedReader' objects} (37 secs) (62 calls) <- any ideas to bring this number of calls down?
context.py:147(chat) (37 secs) (1 call)
client.py:311(begin) (37 secs) (2 calls)
api_requestor.py:569(request_raw) (37 secs) ( 2 calls)
ssl.py:1300(recv_into) (37 secs) (13 calls)
socket.py:692(readinto) (37 secs) (13 calls)
T
Torsten
last year
Any help would be very much appreciated!
L
Logan M
last year
Context chat engine is already about as fast as it gets
Theres a call to embeddings to emebd the user message and retrieve nodes
Then theres a single LLM call to read the retrieved nodes and chat history and respond
L
Logan M
last year
If you are using a local LLM or embedding model, you could look into hosting options to speed those up
L
Logan M
last year
otherwise, you can use streaming to make it feel faster
T
Torsten
last year
Thanks Logan! What do you mean by hosting options? I'm testing the model locally but it seems equally slow when testing on the frontend.
T
Torsten
last year
The repo is on bitbucket
L
Logan M
last year
Are you using a local LLM? I'm talking about running the LLM on vLLM or text-generatation-inference or something that optimizes LLM inference
Add a reply
Sign up and join the conversation on Discord
Join on Discord