I do wonder if this should be supported. I think you could by updating a few lines of code.
query_engine = index.as_query_engine(response_mode="tree_summarize",
service_context=service_context,
summary_template=custom_chat_template)
I got around it by doing this
response_synthesizer = get_response_synthesizer(
response_mode="tree_summarize",
summary_template=custom_chat_template,
service_context=service_context,
)
query_engine = index.as_query_engine(response_synthesizer=response_synthesizer)
Now I can see my custom template being used when doing this ...
event_pairs = llama_debug.get_llm_inputs_outputs()
print(event_pairs[2][0])