Find answers from the community

Updated 3 months ago

Is there any way to do query engine

Is there any way to do query_engine.query(query) without doing a refinement?
query_engine = index.as_query_engine(similarity_top_k=1, response_mode='compact')
query_engine.query(query)
I do this, but the refine runs multiple times.
L
H
4 comments
Did you set any other settings when constructing the index? It technically shouldn't hit the refine process with all default settings (at least in recent versions of llama-index)
@Logan M You were correct. When simplified, refine did not run. However, the problem that arose again is that it exceeds the token size accepted by LLM (ValueError: Requested tokens exceed context window of 512). I'm using wizard-vicuna-13B.ggmlv3.q8_0.bin, but this means it only accepts up to 512 tokens, right? I think the simple solution is to reduce the chunk size when creating the index, is that correct? How can I do that?
512 is pretty imo but yes, you can do that like this

Maybe 200 is small enough?
Plain Text
service_context = ServiceContext.from_defaults(..., chunk_size=200)
Thank you very much. I apologize for the late reply. I was able to execute it successfully.
Add a reply
Sign up and join the conversation on Discord