Find answers from the community

Updated 2 months ago

I have a hierarchy of N more than 10

I have a hierarchy of N (more than 10) RouterQueryEngines composed of one the following QueryEngineTools: Summary, Keyword, and Vector. The documents that they are trained on are large. The selector being used is the PydanticSingleSelector.

I ideally want to put another RouterQueryEngine on top of all of these RouterQueryEngines and use PydanticMultiSelector to answer potentially complex questions. The problem that I'm facing is the Summarizer field. It's far too time and computationally expensive to be practical.

Just the summarizer on one of the sub-RouterQueryEngines with the SingleSelector with a small set of nodes (sub-100) leads to slow performance (can't use async, will run into rate limits with gpt-3.5). I can't imagine what would happen with a larger search space.

I'm not sure what to do because this seems like the ideal architecture that I would want (can't really test to confirm). What are my options? I tried the SimpleSummarizer but that just led to errors . Compact wasn't useful either.

Thank you for fielding all my questions recently. You guys are the real MVPs.
b
W
L
32 comments
Curious what others say here as well, but, I think this just a limitation of trying to do a Summary Index, you can try a different response mode but do you need to summarize all 100 nodes?
I don't think I'm following - the summary index isn't the problem, it's having to specify a summarizer with RouterQueryEngine:

Plain Text
RouterQueryEngine.from_defaults(
      selector=PydanticSingleSelector.from_defaults(verbose=True),
      query_engine_tools=[
          self._get_summary_index_query_engine_tool(),
          self._get_vector_index_query_engine_tool(),
          self._get_keyword_index_query_engine_tool(),
      ],
      summarizer=ResponseMode.TREE_SUMMARIZE, # this part is the issue
      service_context=self.get_service_context()
)

Regardless of the ResponseMode I try, the result is the same
And some indices are going to be composed of 10k+ nodes, so like the summarizer thing is just not good
I've never worked with the RouterQueryEngine so I'll let @Logan M add some insight, but, https://gpt-index.readthedocs.io/en/stable/examples/low_level/router.html#define-routerqueryengine it looks like it only summarizes for multiple responses
The summarizer in the router query engine is only used to summarize/aggregate responses from all sub-responses.

In your example below, since you are using a single selector, it won't even be used actually. It will just query whatever sub-index is selected and return that response
If each sub-index needs to query a ton of data, that's really the issue
Summarizing a large index is a bottleneck you can't really get around
You can set response_mode="tree_summarize" and use_async=True, but at the end of the day you are still sending a lot of data to the LLM
Hmm. Thanks for clarifying and pointing out my actual issue.

Let me think about this specific problem a little bit more.

Speaking generally, I wonder if there’s a way to make summarization problems more efficient. Like I wonder if it’s possible to like create a doc of summaries and then stick a vector index on top of it - so like precomputed summaries and just queries those.
I don’t really understand how summarization works generally so it’s probably a dumb idea out of lack of understanding
Alternatively, what if summarization could be “cached” - meaning, as summaries are created they are stored and reused in the future - similar to I guess memoization and recursion
So summarization largely depends on the query.

You might have a query like "Summarize this document" which is quite general. Or you might have a query like "Summarize all mentions of topic X" which is quite targeted. But both are summaries.

In general, if you use the tree summarize approach, it works by building a bottom up tree of summaries. Wherein text is summarized, and then pairs of summaries are summarized, until you have a root summary that is returned
If you only expect to have one type of summary query, you could definitely cache it
But I suspect some parts of the summary tree will repeat
but maybe it will only be leaves
like "summarize barack obama's life" - maybe summarizes his early life, education, ... all of which can be reused, but then i guess the aggregation summary is where things get unique to the query
Alternatively, I wonder if there's a way to determine from the query the nodes that will be useful (reduce the search space) and then summarize those..?
I think nothing will repeat, since the prompt to summarize includes the users query 🤔
That would just be a vector index I think ha, likely with either a larger top k or a similarity threshold, and of course the tree summarize response mode
I need to read - I'm sorry
do you think it's worthwhile to put a router on top of two vector indices then where one has this configuration and the other one is the default (which I assume would be more efficient in non-summarization scenarios)?
Definitely worth a shot! If you want to use a similarity threshold, there is a node postprocessor for it that you can pass into the query engine
https://gpt-index.readthedocs.io/en/stable/core_modules/query_modules/node_postprocessors/modules.html#similaritypostprocessor

index.as_query_engine(similarity_top_k=20, node_postprocessors=[...])
when exactly is a post_processor applied?
After retrieving nodes, and before sending them to the response synthesizer
Not all heroes wear capes
!!! It works!!! (for a small-ish set)
Plain Text
 print(query_engine.query(query))
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/indices/query/base.py", line 23, in query
    response = self._query(str_or_query_bundle)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/query_engine/router_query_engine.py", line 145, in _query
    responses.append(selected_query_engine.query(query_bundle))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/indices/query/base.py", line 23, in query
    response = self._query(str_or_query_bundle)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
/indices/vector_store/retrievers/retriever.py", line 164, in _get_nodes_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/vector_stores/simple.py", line 221, in query
    top_similarities, top_ids = get_top_k_embeddings(
                                ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/indices/query/embedding_utils.py", line 30, in get_top_k_embeddings
    similarity = similarity_fn(query_embedding_np, emb)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/embeddings/base.py", line 48, in similarity
    product = np.dot(embedding1, embedding2)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: shapes (1536,) and (384,) not aligned: 1536 (dim 0) != 384 (dim 0)

😢
It works with the single RouterQueryEngine and Querying that directly, at least it did. Once I started querying the MultiSelector RouterQueryEngine, the above happened
found this: https://discord.com/channels/1059199217496772688/1059200010622873741/1101139253154549800

I think that's my issue. I used OpenAI for part and HF for others. I'll fix tomorrow
Add a reply
Sign up and join the conversation on Discord