Find answers from the community

Updated last year

I'm having trouble understanding a

At a glance
I'm having trouble understanding a certain aspect, and I would greatly appreciate your clarification.

I've grasped that the fundamental structure of LlamaIndex encompasses three stages: Indexing, Retrieval, and Generation. However, I'm puzzled as to why the query inputs for both the Retrieval and Generation phases are identical.

To illustrate, when I use the command index.as_query_engine().query("Summarize hogehoge in 300 words or less."), it appears to function effectively. Yet, it seems illogical for the Retrieval phase to incorporate the specification "in 300 words or less" as part of its query. My assumption is that only "Summarize hogehoge" should be directed towards Retrieval, and the complete phrase "Summarize hogehoge in 300 words or less" should be reserved for Generation. Could you please confirm if my understanding is correct?

Furthermore, if my interpretation is accurate, is there a way to implement this in practice? Ideally, I envision a scenario where one could perform generation based on nodes retrieved via a command like nodes = index.as_retriever().retrieve("Summarize hogehoge").

Thankyou.
1
T
L
5 comments
If I understood correctly you would want the prompt to be filtered to only include the part that is most relevant for retrieval and then add the extra details back when synthesizing an answer? I mean in that case you would have to do more API calls which has some downsides.

In general I'd say the extra details in the prompt that are needed for generation wont degrade the performance of the retrieval to any significant degree.

Also those types of summarization queries are more suited for modules that don't rely on semantic search since in this case you would only be summarizing the most relevant nodes.

Have you noticed a difference in the retrieval quality or is there another reason you're exploring this?
@Teemu
Thanks for the reply.

In general I'd say the extra details in the prompt that are needed for generation wont degrade the performance of the retrieval

In general I'd say the extra details in the prompt that are needed for generation wont degrade the performance of the retrieval to any significant degree.

I'm wondering about the "what if" part.

In the case of summarization, it would certainly have little impact, but I wonder if an extra prompt with a large number of characters, such as "Please output in a specific JSON format {JsonSchema}", would have an impact on related searches.
I'm wondering if it might affect related searches.
You can definitely retrieve nodes with index.as_retriever().retrieve("text") and then pass those nodes to a response synthesizer

You could also introduce a step to rewrite the query for retrieval

This all requires using some lower-level apis though, but we have many examples

For example, here you can use the retriever and response synthesizer independently without a query engine wrapping it (just ignore the part that uses a query engine here)
https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/usage_pattern.html#low-level-composition-api

Or you can build your own pipeline
https://docs.llamaindex.ai/en/stable/examples/pipeline/query_pipeline.html
@Logan M
Oh, "Defining a Custom Query Engine" is what I was looking for!
Now I can separate search and generate queries!
thank you very match!!
only "Summarize hogehoge" should be directed towards Retrieval
Yes.

Let me raise an example:
If you included the phrase "in 300 words or less" when querying something, the query engine will match high-school-level English 101 practice materials more than it matches hogehoge Wikipedia page.

In addition to the 2 routes Logan suggested, I also would recommend you to explore the option of using Agent instead. You would need to convert the Query Engine at hand into a Tool that the Agent can wield. The Agent knows how to distill the search-worthy keywords from your query.

I think your confusion could have been avoided if you added an observability tool to your setup. It breaks down each step for you (see screenshot attached). (I'm using Arize Phoenix, which only took 4 lines of code to spin up.)
Attachment
1VRrWaAy66EQle6nLQytJHw.png
Add a reply
Sign up and join the conversation on Discord