----------------------
In LlamaIndex, you can use a BM25-based model in combination with a vector-based model to create a hybrid retriever. This allows you to capture both semantic relations and keywords in your input queries. Here's how you can do it:
First, you create your retrievers. Each will retrieve the top-10 most similar nodes.
from llama_index.retrievers.bm25 import BM25Retriever
vector_retriever = index.as_retriever(similarity_top_k=5)
bm25_retriever = BM25Retriever.from_defaults(
docstore=index.docstore, similarity_top_k=10
)
Next, you can create your fusion retriever, which will return the top-10 most similar nodes from the 20 returned nodes from the retrievers.
from llama_index.core.retrievers import QueryFusionRetriever
retriever = QueryFusionRetriever(
[vector_retriever, bm25_retriever],
retriever_weights=[0.6, 0.4],
similarity_top_k=10,
num_queries=1, # set this to 1 to disable query generation
mode="relative_score",
use_async=True,
verbose=True,
)
You can then use the retriever to retrieve nodes relevant to a specific query.
nodes_with_scores = retriever.retrieve(
"What happened at Interleafe and Viaweb?"
)
For more details, you can refer to the
LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------