Find answers from the community

Updated 2 months ago

Sparse Retriever: Considering Text or Metadata in Retrieval

At a glance

The community member asks whether the BM25Retriever takes metadata of nodes into account, and if so, how to prevent it and make it consider only the text of nodes. In the comments, another community member suggests that when creating documents, you can exclude specific metadata keys from being used in the embedding process or sent to the language model for response generation.

Plain Text
sparse_retriever = BM25Retriever.from_defaults(docstore=docstore, similarity_top_k=5)

Does BM25Retriever takes metadata of nodes into account or not? If yes then how to prevent it and make it to only consider text of nodes into account.
W
1 comment
You can do that when you documents are created.
Once your documents are created , each document contains fields that gets ignored while creating the embeddings or being sent to LLM.

Plain Text
doc = Document(text = "This is first docu")
        
# if you add any metadata key here it will be excluded from embedding process
doc.excluded_embed_metadata_keys = []
# same here , any key added here will not go to LLM for response generation
doc.excluded_llm_metadata_keys = []
Add a reply
Sign up and join the conversation on Discord