Curious what others say here as well, but, I think this just a limitation of trying to do a Summary Index, you can try a different response mode but do you need to summarize all 100 nodes?
I don't think I'm following - the summary index isn't the problem, it's having to specify a summarizer with RouterQueryEngine:
RouterQueryEngine.from_defaults(
selector=PydanticSingleSelector.from_defaults(verbose=True),
query_engine_tools=[
self._get_summary_index_query_engine_tool(),
self._get_vector_index_query_engine_tool(),
self._get_keyword_index_query_engine_tool(),
],
summarizer=ResponseMode.TREE_SUMMARIZE, # this part is the issue
service_context=self.get_service_context()
)
Regardless of the ResponseMode I try, the result is the same
And some indices are going to be composed of 10k+ nodes, so like the summarizer thing is just not good
The summarizer in the router query engine is only used to summarize/aggregate responses from all sub-responses.
In your example below, since you are using a single selector, it won't even be used actually. It will just query whatever sub-index is selected and return that response
If each sub-index needs to query a ton of data, that's really the issue
Summarizing a large index is a bottleneck you can't really get around
You can set response_mode="tree_summarize" and use_async=True, but at the end of the day you are still sending a lot of data to the LLM
Hmm. Thanks for clarifying and pointing out my actual issue.
Let me think about this specific problem a little bit more.
Speaking generally, I wonder if there’s a way to make summarization problems more efficient. Like I wonder if it’s possible to like create a doc of summaries and then stick a vector index on top of it - so like precomputed summaries and just queries those.
I don’t really understand how summarization works generally so it’s probably a dumb idea out of lack of understanding
Alternatively, what if summarization could be “cached” - meaning, as summaries are created they are stored and reused in the future - similar to I guess memoization and recursion
So summarization largely depends on the query.
You might have a query like "Summarize this document" which is quite general. Or you might have a query like "Summarize all mentions of topic X" which is quite targeted. But both are summaries.
In general, if you use the tree summarize approach, it works by building a bottom up tree of summaries. Wherein text is summarized, and then pairs of summaries are summarized, until you have a root summary that is returned
If you only expect to have one type of summary query, you could definitely cache it
But I suspect some parts of the summary tree will repeat
but maybe it will only be leaves
like "summarize barack obama's life" - maybe summarizes his early life, education, ... all of which can be reused, but then i guess the aggregation summary is where things get unique to the query
Alternatively, I wonder if there's a way to determine from the query the nodes that will be useful (reduce the search space) and then summarize those..?
I think nothing will repeat, since the prompt to summarize includes the users query 🤔
That would just be a vector index I think ha, likely with either a larger top k or a similarity threshold, and of course the tree summarize response mode
I need to read - I'm sorry
do you think it's worthwhile to put a router on top of two vector indices then where one has this configuration and the other one is the default (which I assume would be more efficient in non-summarization scenarios)?
when exactly is a post_processor applied?
After retrieving nodes, and before sending them to the response synthesizer
Not all heroes wear capes
!!! It works!!! (for a small-ish set)
print(query_engine.query(query))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/indices/query/base.py", line 23, in query
response = self._query(str_or_query_bundle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/query_engine/router_query_engine.py", line 145, in _query
responses.append(selected_query_engine.query(query_bundle))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/indices/query/base.py", line 23, in query
response = self._query(str_or_query_bundle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
/indices/vector_store/retrievers/retriever.py", line 164, in _get_nodes_with_embeddings
query_result = self._vector_store.query(query, **self._kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/vector_stores/simple.py", line 221, in query
top_similarities, top_ids = get_top_k_embeddings(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/indices/query/embedding_utils.py", line 30, in get_top_k_embeddings
similarity = similarity_fn(query_embedding_np, emb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/embeddings/base.py", line 48, in similarity
product = np.dot(embedding1, embedding2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: shapes (1536,) and (384,) not aligned: 1536 (dim 0) != 384 (dim 0)
😢
It works with the single RouterQueryEngine and Querying that directly, at least it did. Once I started querying the MultiSelector RouterQueryEngine, the above happened