I have a hierarchy of N more than 10

WWyrine

I have a hierarchy of N (more than 10) RouterQueryEngines composed of one the following QueryEngineTools: Summary, Keyword, and Vector. The documents that they are trained on are large. The selector being used is the PydanticSingleSelector.

I ideally want to put another RouterQueryEngine on top of all of these RouterQueryEngines and use PydanticMultiSelector to answer potentially complex questions. The problem that I'm facing is the Summarizer field. It's far too time and computationally expensive to be practical.

Just the summarizer on one of the sub-RouterQueryEngines with the SingleSelector with a small set of nodes (sub-100) leads to slow performance (can't use async, will run into rate limits with gpt-3.5). I can't imagine what would happen with a larger search space.

I'm not sure what to do because this seems like the ideal architecture that I would want (can't really test to confirm). What are my options? I tried the SimpleSummarizer but that just led to errors . Compact wasn't useful either.

Thank you for fielding all my questions recently. You guys are the real MVPs.

32 comments

bbmax

Curious what others say here as well, but, I think this just a limitation of trying to do a Summary Index, you can try a different response mode but do you need to summarize all 100 nodes?

WWyrine

I don't think I'm following - the summary index isn't the problem, it's having to specify a summarizer with RouterQueryEngine:

Plain Text

RouterQueryEngine.from_defaults(
      selector=PydanticSingleSelector.from_defaults(verbose=True),
      query_engine_tools=[
          self._get_summary_index_query_engine_tool(),
          self._get_vector_index_query_engine_tool(),
          self._get_keyword_index_query_engine_tool(),
      ],
      summarizer=ResponseMode.TREE_SUMMARIZE, # this part is the issue
      service_context=self.get_service_context()
)

Regardless of the ResponseMode I try, the result is the same

WWyrine

And some indices are going to be composed of 10k+ nodes, so like the summarizer thing is just not good

bbmax

I see.

bbmax

I've never worked with the RouterQueryEngine so I'll let @Logan M add some insight, but, https://gpt-index.readthedocs.io/en/stable/examples/low_level/router.html#define-routerqueryengine it looks like it only summarizes for multiple responses

LLogan M

The summarizer in the router query engine is only used to summarize/aggregate responses from all sub-responses.

In your example below, since you are using a single selector, it won't even be used actually. It will just query whatever sub-index is selected and return that response

LLogan M

If each sub-index needs to query a ton of data, that's really the issue

LLogan M

Summarizing a large index is a bottleneck you can't really get around

LLogan M

You can set response_mode="tree_summarize" and use_async=True, but at the end of the day you are still sending a lot of data to the LLM

WWyrine

Hmm. Thanks for clarifying and pointing out my actual issue.

Let me think about this specific problem a little bit more.

Speaking generally, I wonder if there’s a way to make summarization problems more efficient. Like I wonder if it’s possible to like create a doc of summaries and then stick a vector index on top of it - so like precomputed summaries and just queries those.

WWyrine

I don’t really understand how summarization works generally so it’s probably a dumb idea out of lack of understanding

WWyrine

Alternatively, what if summarization could be “cached” - meaning, as summaries are created they are stored and reused in the future - similar to I guess memoization and recursion

LLogan M

So summarization largely depends on the query.

You might have a query like "Summarize this document" which is quite general. Or you might have a query like "Summarize all mentions of topic X" which is quite targeted. But both are summaries.

In general, if you use the tree summarize approach, it works by building a bottom up tree of summaries. Wherein text is summarized, and then pairs of summaries are summarized, until you have a root summary that is returned

LLogan M

If you only expect to have one type of summary query, you could definitely cache it

WWyrine

But I suspect some parts of the summary tree will repeat

WWyrine

but maybe it will only be leaves

WWyrine

like "summarize barack obama's life" - maybe summarizes his early life, education, ... all of which can be reused, but then i guess the aggregation summary is where things get unique to the query

WWyrine

Alternatively, I wonder if there's a way to determine from the query the nodes that will be useful (reduce the search space) and then summarize those..?

LLogan M

I think nothing will repeat, since the prompt to summarize includes the users query 🤔

LLogan M

That would just be a vector index I think ha, likely with either a larger top k or a similarity threshold, and of course the tree summarize response mode

WWyrine

I need to read - I'm sorry

LLogan M

No worries!

WWyrine

do you think it's worthwhile to put a router on top of two vector indices then where one has this configuration and the other one is the default (which I assume would be more efficient in non-summarization scenarios)?

LLogan M

Definitely worth a shot! If you want to use a similarity threshold, there is a node postprocessor for it that you can pass into the query engine
https://gpt-index.readthedocs.io/en/stable/core_modules/query_modules/node_postprocessors/modules.html#similaritypostprocessor

index.as_query_engine(similarity_top_k=20, node_postprocessors=[...])

WWyrine

when exactly is a post_processor applied?

LLogan M

After retrieving nodes, and before sending them to the response synthesizer

WWyrine

Not all heroes wear capes

WWyrine

!!! It works!!! (for a small-ish set)

LLogan M

Nice!!

WWyrine

Plain Text

 print(query_engine.query(query))
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/indices/query/base.py", line 23, in query
    response = self._query(str_or_query_bundle)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/query_engine/router_query_engine.py", line 145, in _query
    responses.append(selected_query_engine.query(query_bundle))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/indices/query/base.py", line 23, in query
    response = self._query(str_or_query_bundle)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
/indices/vector_store/retrievers/retriever.py", line 164, in _get_nodes_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/vector_stores/simple.py", line 221, in query
    top_similarities, top_ids = get_top_k_embeddings(
                                ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/indices/query/embedding_utils.py", line 30, in get_top_k_embeddings
    similarity = similarity_fn(query_embedding_np, emb)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirolosshahat/Desktop/projects/ask-the-fathers/venv/lib/python3.11/site-packages/llama_index/embeddings/base.py", line 48, in similarity
    product = np.dot(embedding1, embedding2)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: shapes (1536,) and (384,) not aligned: 1536 (dim 0) != 384 (dim 0)

😢

WWyrine

It works with the single RouterQueryEngine and Querying that directly, at least it did. Once I started querying the MultiSelector RouterQueryEngine, the above happened

WWyrine

found this: https://discord.com/channels/1059199217496772688/1059200010622873741/1101139253154549800

I think that's my issue. I used OpenAI for part and HF for others. I'll fix tomorrow

Add a reply

Find answers from the community

I have a hierarchy of N more than 10