Find answers from the community

Updated 2 months ago

How do I use a bm25-based model for embeddings?

How do I use a bm25-based model for embeddings?
k
2 comments
Please stand by @dkatz123 as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
In LlamaIndex, you can use a BM25-based model in combination with a vector-based model to create a hybrid retriever. This allows you to capture both semantic relations and keywords in your input queries. Here's how you can do it:

First, you create your retrievers. Each will retrieve the top-10 most similar nodes.

Plain Text
from llama_index.retrievers.bm25 import BM25Retriever

vector_retriever = index.as_retriever(similarity_top_k=5)

bm25_retriever = BM25Retriever.from_defaults(
    docstore=index.docstore, similarity_top_k=10
)


Next, you can create your fusion retriever, which will return the top-10 most similar nodes from the 20 returned nodes from the retrievers.

Plain Text
from llama_index.core.retrievers import QueryFusionRetriever

retriever = QueryFusionRetriever(
    [vector_retriever, bm25_retriever],
    retriever_weights=[0.6, 0.4],
    similarity_top_k=10,
    num_queries=1,  # set this to 1 to disable query generation
    mode="relative_score",
    use_async=True,
    verbose=True,
)


You can then use the retriever to retrieve nodes relevant to a specific query.

Plain Text
nodes_with_scores = retriever.retrieve(
    "What happened at Interleafe and Viaweb?"
)


For more details, you can refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord