is there a method to control the times of refined_response? in many time the initial_response is good enough, but it still goes to get 2 or 3 times refined_response, how to say "good enough, please stop" to query?
---------------------- Hmm, I don't know enough to give you a confident answer yet. However, you can refer to the GPT index documentation for more information: https://gpt-index.readthedocs.io/en/latest ---------------------- Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: ----------------------
@Logan M hi, could you bring some idea? I set chunk_size_limit=2000 when creating index, set chunk_size_limit=500 when query, but the nodes still normally over 1000 tokens long, refines 3 times will cost near 10000 tokens and takes about 20 seconds.... if there is a way to tell the query to respond with initial_response directly?
Yea so since there are two chunk sizes, even if tok_k is 1, it breaks that 2000 size chunk into 4 chunks of 500 and refines
I can picture adding an early stop and max_iteration options to the query. Max iterations would just stop finding an answer after X LLM calls, early stopping would stop when the overlap between the answer and context is large enough (I.e. ROUGE score)
got it. before the modification done, I will reindex everything with chunk_size_Limit=500. a interesting thing: yesterday one query refined 19 times, it's a nice feature for contents much longer than llm response limit in fact, but refined response are not as good as initial response in many circumstances too.