Find answers from the community

Updated 3 months ago

Is there any way to do query engine

Is there any way to do query_engine.query(query) without doing a refinement?
query_engine = index.as_query_engine(similarity_top_k=1, response_mode='compact')
query_engine.query(query)
I do this, but the refine runs multiple times.

4 comments

LLogan M

Did you set any other settings when constructing the index? It technically shouldn't hit the refine process with all default settings (at least in recent versions of llama-index)

HHiroki Kawai

@Logan M You were correct. When simplified, refine did not run. However, the problem that arose again is that it exceeds the token size accepted by LLM (ValueError: Requested tokens exceed context window of 512). I'm using wizard-vicuna-13B.ggmlv3.q8_0.bin, but this means it only accepts up to 512 tokens, right? I think the simple solution is to reduce the chunk size when creating the index, is that correct? How can I do that?

LLogan M

512 is pretty imo but yes, you can do that like this

Maybe 200 is small enough?

Plain Text

service_context = ServiceContext.from_defaults(..., chunk_size=200)

HHiroki Kawai

Thank you very much. I apologize for the late reply. I was able to execute it successfully.

Add a reply