Find answers from the community

Updated 5 months ago

Hello, I want to ask. If we make a

Hello, I want to ask. If we make a retrieval using BM25, then add the documents that we want to be retrieved, so we have to re-index from the beginning, right? Because I think the Document Frequency has changed, right? CMIIW
L
E
3 comments
yes, BM25 needs to re-index the entire thing when the nodes/dataset changes
I tried looking on how to do things if I want to implement hybrid search esp If I want to use BM25 for the sparse model.

My use case is that I added like 300+ documents per day. I think that updating entire index 300 times everyday would be very costly and inefficient.

Also from what I understand, you'll need to load all the nodes in the memory to do searching then. I am thinking of putting the vectors into Vector Database that supports sparse vector. But then, I still need to update the sparse vectors that I stored to the VDB right?

Sorry if this is OOT. If you didn't want to answer this, perhaps can I ask a community to ask these types of questions (community that's focused on RAG things?)
yea, thats the main drawback of bm25

There are other approaches to sparse embeddings, like splade, that also work well here.
Add a reply
Sign up and join the conversation on Discord