retrieve
method instead of the query
method. Here's how you can do it:retriever = RouterRetriever( selector, retriever_tools, llm=llm )
query
method, you can use the retrieve
method directly on the retriever:query_bundle = QueryBundle("Your query here") retrieved_nodes = retriever.retrieve(query_bundle)
RetrieverQueryEngine
, you can access the underlying retriever and use its retrieve
method:query_engine = RetrieverQueryEngine(retriever=your_retriever) retrieved_nodes = query_engine.retriever.retrieve(query_bundle)
retrieved_nodes
will be a list of NodeWithScore
objects, which contain the retrieved nodes along with their relevance scores.from llama_index.core.prompts import PromptTemplate custom_prompt_template = PromptTemplate( "Answer the following question based on the given context:\n" "Context: {context_str}\n" "Question: {query_str}\n" "Answer: " )
SummaryIndex
using the reranked nodes:from llama_index.core import SummaryIndex summary_index = SummaryIndex([n.node for n in reranked_nodes])
SummaryIndex
with your custom prompt:query_engine = summary_index.as_query_engine( text_qa_template=custom_prompt_template )
response = query_engine.query(query)
from llama_index.core import SummaryIndex from llama_index.core.prompts import PromptTemplate # Assuming you already have the reranked nodes # nodes = index_by_flattened_namespace_section.router_retriever.retrieve(query) # reranked_nodes = reranker.postprocess_nodes(nodes, query_str=query) # Create a custom prompt template custom_prompt_template = PromptTemplate( "Answer the following question based on the given context:\n" "Context: {context_str}\n" "Question: {query_str}\n" "Answer: " ) # Create a SummaryIndex from the reranked nodes summary_index = SummaryIndex([n.node for n in reranked_nodes]) # Create a query engine with the custom prompt query_engine = summary_index.as_query_engine( text_qa_template=custom_prompt_template ) # Generate a response response = query_engine.query(query) print(response)
SummaryIndex
is used here as a convenient way to create a new index from the reranked nodes, which can then be queried with your custom prompt.from llama_index.core.prompts import PromptTemplate custom_prompt = PromptTemplate( "Answer the following question based on the given context:\n" "Context: {context_str}\n" "Question: {query_str}\n" "Answer: " )
from llama_index.core.response_synthesizers import get_response_synthesizer response_synthesizer = get_response_synthesizer( response_mode="compact", text_qa_template=custom_prompt )
from llama_index.core import QueryBundle # Assuming you already have the reranked nodes # reranked_nodes = reranker.postprocess_nodes(nodes, query_str=query) query_bundle = QueryBundle(query) response = response_synthesizer.synthesize( query_bundle, nodes=reranked_nodes ) print(response)
refine
: This mode creates and refines an answer by sequentially going through each retrieved text chunk. It makes a separate LLM call for each node/retrieved chunk. It's good for generating more detailed answers.compact
(default): Similar to refine
, but it compacts (concatenates) the chunks beforehand, resulting in fewer LLM calls. It's more efficient than refine
while still providing detailed answers.tree_summarize
: This mode queries the LLM using a summary template for all concatenated chunks, then recursively summarizes the results until there's only one final answer. It's particularly useful for summarization purposes.simple_summarize
: This mode truncates all text chunks to fit into a single LLM prompt. It's good for quick summarization but may lose detail due to truncation.no_text
: This mode only runs the retriever to fetch the nodes that would have been sent to the LLM, without actually sending them. The retrieved nodes can be inspected by checking response.source_nodes
.accumulate
: This mode applies the query to each text chunk separately while accumulating the responses into an array. It returns a concatenated string of all responses. It's useful when you need to run the same query against each text chunk individually.compact_accumulate
: This is similar to accumulate
, but it "compacts" each LLM prompt similar to the compact
mode before running the query against each text chunk.response_mode
in the get_response_synthesizer
function based on the information you've shared.response_mode
parameter in the get_response_synthesizer
function is indeed a valid argument, and it's of type ResponseMode
. Here's a more accurate explanation of how to use it and what options are available:from llama_index.core import get_response_synthesizer from llama_index.core.response_synthesizers import ResponseMode response_synthesizer = get_response_synthesizer( response_mode=ResponseMode.COMPACT )
ResponseMode
is an enumeration that defines different modes for synthesizing responses. Based on the LlamaIndex documentation on Response Modes, the available options are:ResponseMode.REFINE
ResponseMode.COMPACT
(default)ResponseMode.TREE_SUMMARIZE
ResponseMode.SIMPLE_SUMMARIZE
ResponseMode.NO_TEXT
ResponseMode.CONTEXT_ONLY
ResponseMode.ACCUMULATE
ResponseMode.COMPACT_ACCUMULATE
# Using TREE_SUMMARIZE mode response_synthesizer = get_response_synthesizer( response_mode=ResponseMode.TREE_SUMMARIZE ) # Using REFINE mode response_synthesizer = get_response_synthesizer( response_mode=ResponseMode.REFINE )
ResponseMode
based on your specific needs for response generation. The behavior of each mode is as described in my previous response.get_response_synthesizer
and the available options, you can refer to the LlamaIndex API Reference for get_response_synthesizer.synthesize
method:def synthesize( self, query_bundle: QueryBundle, nodes: List[NodeWithScore], additional_source_nodes: Optional[Sequence[NodeWithScore]] = None, ) -> RESPONSE_TYPE:
from llama_index.core import QueryBundle query_bundle = QueryBundle( query_str="What did the author do growing up?", custom_embedding_strs=["The author grew up painting."], )
synthesize
method of a response synthesizer, you pass the QueryBundle along with the retrieved nodes:response = response_synthesizer.synthesize( query_bundle, nodes=reranked_nodes )