Find answers from the community

Updated last year

For my chat_engine i used this code : "

For my chat_engine i used this code : "vector_query_engine = vector_index.as_chat_engine(text_qa_template=text_qa_template, refine_template=refine_template, response_mode="condense_plus_context")"
The reponses from this query_engine are not condense. Each time i get different length. How can i standardize the reponse size ?
L
L
6 comments
the condense part does not refer to response length

Only prompt engineering can standardize output length. The LLM will keep writing until it thinks it's done
Thanks for answering
Wont this part fix this ? chat_text_qa_msgs = [
ChatMessage(
role=MessageRole.SYSTEM,
content=(
"Always answer the question, even if the context isn't helpful. \n"
"Max number of answer tokens in 256 with 30 tokens exceeding limit if you really cannot write in any other way \n"
"To the best of your ability and the context you try to provide helpful information about anything regarding WH2C project.\n"
"You politely refuse to answer and questions that might differ from the context of your {context_str}. \n"
that assumes the LLM knows how to count πŸ˜‰ Probably better to ask it to limit its response to 1-3 sentences or something like that
Correct yes... I will adjust the prompt...thanks
Can i ask you one more question regarding the thing i am building ?
Add a reply
Sign up and join the conversation on Discord