----------------------
Based on the information provided in the knowledge sources, I can offer some insights and suggestions to address your issue with hitting context window limits when using the summary index in your agent query.
The problem you're encountering is likely related to how the summary index is handling large amounts of information. Here are some potential solutions and modifications you can try:
- Use a reranker:
You can add a reranker to your node postprocessors to help prioritize the most relevant information. This can help reduce the amount of content being passed to the context window. For example, you could use the SentenceTransformerRerank:
from llama_index.core.postprocessor import SentenceTransformerRerank
rerank = SentenceTransformerRerank(
model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=3
)
node_postprocessors = [rerank]
Then, apply this to your summary query engine:
summary_query_engine = summary_index.as_query_engine(
llm=llm_4o_mini,
node_postprocessors=node_postprocessors
)
LlamaIndex > En > Latest > Examples > Node_postprocessor > SentenceTransformerRerank- Adjust the response mode:
For the summary index, you might want to explicitly set the response mode to "tree_summarize" and enable async processing:
summary_query_engine = summary_index.as_query_engine(
llm=llm_4o_mini,
response_mode="tree_summarize",
use_async=True
)
This can help in handling large amounts of information more efficiently
LlamaIndex > En > Latest > Examples > Agent > Build Agentic RAG with Llamaindex for Vertex AI > Task 3: Building an Agent Reasoning Loop.