Find answers from the community

Updated 5 months ago

Mistral 7b

At a glance

Hello everyone using mistral 7b instruct as my llm but when applying it into chat engine got incomplete response as compared to llama2 llm model

10 comments

WWhiteFang_Jr

It could be because of the following reasons I suppose

What is the max_token value that you have set. And how much was left for llm to process and generate.
opensource llm smaller than 13b sometimes act wierd.

aadeelhasan

@WhiteFang_Jr max_new_tokens=256

WWhiteFang_Jr

Could it be possible the max new tokens are already generated?

WWhiteFang_Jr

Did you check on that

aadeelhasan

@WhiteFang_Jr can u plz telll me why i am getting different responses from chatengine and queryengine for the same query

WWhiteFang_Jr

Which chatengine are you trying with? Condense mode?

aadeelhasan

@WhiteFang_Jr context mode

WWhiteFang_Jr

Context engine lets you set the system prompt, Do you have the same thing for query engine as well? In the form of template?

It could be causing the difference in the response plus context maintains the chat history that can also alter the response part using the llm

aadeelhasan

@WhiteFang_Jr regarding your 1st point i am using the default template in both the cases

2nd point-for the every first time i think the chat history should be blank

aadeelhasan

@WhiteFang_Jr can u suggest some ways for better performance on chatengine

Add a reply