Find answers from the community

Updated last year

Hi , I seen llmlingua and tried with

At a glance

Hi , I seen llmlingua and tried with llamaindex , also using llamacpp for loading llm. For each question the time taken to get prefix match hit is too high . Llm inference time although reduced, time taken to hit llm is so high that without llmlingua my chat engine is giving faster response. Any idea on this. Iam using only cpu

4 comments

LLogan M

Using CPU is the limiting factor here -- llama.cpp is just really not that fast

TTech explorer

Also llamaindex chatengine.stream_chat is behaving weird , like if i I ask same question again it's giving weird response and generating itself a user: some garbage question. How to prevent it ?

TTech explorer

And also what's the alternative for llamacpp for cpu machines

LLogan M

Llama.cpp is the best option for cpu lol

Also I'm not sure about that chat thing. It depends on which chat engine you are using, how you setup the llm, etc.

Add a reply