Hi , I seen llmlingua and tried with llamaindex , also using llamacpp for loading llm. For each question the time taken to get prefix match hit is too high . Llm inference time although reduced, time taken to hit llm is so high that without llmlingua my chat engine is giving faster response. Any idea on this. Iam using only cpu
Also llamaindex chatengine.stream_chat is behaving weird , like if i I ask same question again it's giving weird response and generating itself a user: some garbage question. How to prevent it ?