Hello, I want to ask. If we make a

At a glance

Hello, I want to ask. If we make a retrieval using BM25, then add the documents that we want to be retrieved, so we have to re-index from the beginning, right? Because I think the Document Frequency has changed, right? CMIIW

3 comments

LLogan M

yes, BM25 needs to re-index the entire thing when the nodes/dataset changes

EEdd

I tried looking on how to do things if I want to implement hybrid search esp If I want to use BM25 for the sparse model.

My use case is that I added like 300+ documents per day. I think that updating entire index 300 times everyday would be very costly and inefficient.

Also from what I understand, you'll need to load all the nodes in the memory to do searching then. I am thinking of putting the vectors into Vector Database that supports sparse vector. But then, I still need to update the sparse vectors that I stored to the VDB right?

Sorry if this is OOT. If you didn't want to answer this, perhaps can I ask a community to ask these types of questions (community that's focused on RAG things?)

LLogan M

yea, thats the main drawback of bm25

There are other approaches to sparse embeddings, like splade, that also work well here.

Add a reply

Find answers from the community

Hello, I want to ask. If we make a