Another question - in the query engine, we are using the default COMPACT response sythethizer but we noticed that it does a significant amount of chunking which seems to significantly increase the costs and the latency. According to the documentation this seems to be normal ( https://docs.llamaindex.ai/en/stable/module_guides/querying/response_synthesizers/ ). Is there any way of disabling the chunking in any shape or form ?
yeah, my bad, wrong choice of words π - so basically we saw that for some of the responses there were 5 - 7 LLM calls which yeah adds some extra latency.
we would like to see how it would behave if we can completely disable the chunking part (basically remove the refine part and keep it fully compact)