chunk_size_limit=512
in the service context when constructing the index.index.query(..., response_mode="compact")
, which in theory means everything will fit into one LLM call. response_mode="no_text"
in the query, which will skip calling the LLM entirely and only return the source nodes in the response object