For my chat_engine i used this code : "

At a glance

For my chat_engine i used this code : "vector_query_engine = vector_index.as_chat_engine(text_qa_template=text_qa_template, refine_template=refine_template, response_mode="condense_plus_context")"
The reponses from this query_engine are not condense. Each time i get different length. How can i standardize the reponse size ?

6 comments

LLogan M

the condense part does not refer to response length

Only prompt engineering can standardize output length. The LLM will keep writing until it thinks it's done

LLoLiPoPMaN

Thanks for answering

LLoLiPoPMaN

Wont this part fix this ? chat_text_qa_msgs = [
ChatMessage(
role=MessageRole.SYSTEM,
content=(
"Always answer the question, even if the context isn't helpful. \n"
"Max number of answer tokens in 256 with 30 tokens exceeding limit if you really cannot write in any other way \n"
"To the best of your ability and the context you try to provide helpful information about anything regarding WH2C project.\n"
"You politely refuse to answer and questions that might differ from the context of your {context_str}. \n"

LLogan M

that assumes the LLM knows how to count 😉 Probably better to ask it to limit its response to 1-3 sentences or something like that

LLoLiPoPMaN

Correct yes... I will adjust the prompt...thanks

LLoLiPoPMaN

Can i ask you one more question regarding the thing i am building ?

Add a reply

Find answers from the community

For my chat_engine i used this code : "