Find answers from the community

Updated 5 months ago

Hey, I'm trying to combine a keyword

At a glance

Hey, I'm trying to combine a keyword retriever along witha vector retriever as a tool for the agent. The setup I have uses bm25 which loads documents from mongodb but with big dataset it becomes overloaded, is there any way to fix this? pls help

Attachment

9 comments

TTungdepzai

More info: I'm trying to query tabular data with no known schema so cant use sql, Im mapping each row into a Document in chromadb for vector retriever and a Document in mongodb for keyword retriever

LLogan M

becomes overloaded -- you mean runs out of memory or something?

TTungdepzai

yes, since its retrieving the whole collection for every query I believe

LLogan M

not for every query (I think?), but it is when you initially load BM25.

Unfortunately thats just how BM25 works, no way around that

LLogan M

it needs all the data upfront

TTungdepzai

all of the image is in a chat api so every chat req does all that from start

TTungdepzai

Is there any other keyword retriever you can reccomend that can do this more efficiently?

LLogan M

Buy more RAM? 😆

BM25 is the only keyword retriever that returns a score, others cant be used with the query fusion retriver.

You could implement a custom hybrid retriever though
https://docs.llamaindex.ai/en/stable/examples/query_engine/CustomRetrievers.html

or use a vectordb that supports hyrbid search
https://docs.llamaindex.ai/en/stable/examples/vector_stores/PineconeIndexDemo-Hybrid.html

TTungdepzai

gotta do some more digging I guess 😅 , thanks

Add a reply