Is there any way to do query_engine.query(query) without doing a refinement? query_engine = index.as_query_engine(similarity_top_k=1, response_mode='compact') query_engine.query(query) I do this, but the refine runs multiple times.
Did you set any other settings when constructing the index? It technically shouldn't hit the refine process with all default settings (at least in recent versions of llama-index)
@Logan M You were correct. When simplified, refine did not run. However, the problem that arose again is that it exceeds the token size accepted by LLM (ValueError: Requested tokens exceed context window of 512). I'm using wizard-vicuna-13B.ggmlv3.q8_0.bin, but this means it only accepts up to 512 tokens, right? I think the simple solution is to reduce the chunk size when creating the index, is that correct? How can I do that?