How do I use a bm25-based model for embeddings?

----------------------
In LlamaIndex, you can use a BM25-based model in combination with a vector-based model to create a hybrid retriever. This allows you to capture both semantic relations and keywords in your input queries. Here's how you can do it:

First, you create your retrievers. Each will retrieve the top-10 most similar nodes.

Plain Text

from llama_index.retrievers.bm25 import BM25Retriever

vector_retriever = index.as_retriever(similarity_top_k=5)

bm25_retriever = BM25Retriever.from_defaults(
    docstore=index.docstore, similarity_top_k=10
)

Next, you can create your fusion retriever, which will return the top-10 most similar nodes from the 20 returned nodes from the retrievers.

Plain Text

from llama_index.core.retrievers import QueryFusionRetriever

retriever = QueryFusionRetriever(
    [vector_retriever, bm25_retriever],
    retriever_weights=[0.6, 0.4],
    similarity_top_k=10,
    num_queries=1,  # set this to 1 to disable query generation
    mode="relative_score",
    use_async=True,
    verbose=True,
)

You can then use the retriever to retrieve nodes relevant to a specific query.

Plain Text

nodes_with_scores = retriever.retrieve(
    "What happened at Interleafe and Viaweb?"
)

For more details, you can refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Find answers from the community

How do I use a bm25-based model for embeddings?