Hi, may I know what's the maximum prompt size between text_qa_template and refine_template mentioned in the
compact
response mode here?
https://docs.llamaindex.ai/en/stable/module_guides/querying/response_synthesizers/#:~:text=Details%3A%20stuff,between%20text%20chunks).
I am retrieving a long context and looks like it cuts my prompts within 4000 tokens and refine the remaining context. May I know why?
Details: stuff as many text (concatenated/packed from the retrieved chunks) that can fit within the context window (considering the maximum prompt size between text_qa_template and refine_template). If the text is too long to fit in one prompt, it is split in as many parts as needed (using a TokenTextSplitter and thus allowing some overlap between text chunks).