Hello all, I have a RAG application that returns structured data over documents such as invoices and purchase orders. My aim is to retrieve 2 sets of information from those documents. 1) Header information like supplier details, customer details, document date, tax amounts, total amounts, etc. 2) Table information of line items like description, quantity, unit prices, sub-total amounts.
After I parse the document, I split them into nodes, chunk them and build a retriever engine with reranker on top of it. Somewhat similar to this example here:
https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/FlagEmbeddingReranker/I am currently sending 2 user queries (one to retrieve header information and another table information). In each of those prompts, I am trying to query all the information that exist in either the header or table. In this case, does having the reranker in the retrieval engine makes a difference in retrieval accuracy?
For example I have 2 nodes for header information. One of the node contains the document date, and the other contains the total amounts. If in my query, I am asking for both information at the same time. Both nodes will be retrieve and then passed to the LLM for generation right? If that is the case, having the reranker seems redundant?