Find answers from the community

Updated 3 months ago

Hey, I'm trying to combine a keyword

Hey, I'm trying to combine a keyword retriever along witha vector retriever as a tool for the agent. The setup I have uses bm25 which loads documents from mongodb but with big dataset it becomes overloaded, is there any way to fix this? pls help
Attachment
image.png
T
L
9 comments
More info: I'm trying to query tabular data with no known schema so cant use sql, Im mapping each row into a Document in chromadb for vector retriever and a Document in mongodb for keyword retriever
becomes overloaded -- you mean runs out of memory or something?
yes, since its retrieving the whole collection for every query I believe
not for every query (I think?), but it is when you initially load BM25.

Unfortunately thats just how BM25 works, no way around that
it needs all the data upfront
all of the image is in a chat api so every chat req does all that from start
Is there any other keyword retriever you can reccomend that can do this more efficiently?
Buy more RAM? πŸ˜†

BM25 is the only keyword retriever that returns a score, others cant be used with the query fusion retriver.

You could implement a custom hybrid retriever though
https://docs.llamaindex.ai/en/stable/examples/query_engine/CustomRetrievers.html

or use a vectordb that supports hyrbid search
https://docs.llamaindex.ai/en/stable/examples/vector_stores/PineconeIndexDemo-Hybrid.html
gotta do some more digging I guess πŸ˜… , thanks
Add a reply
Sign up and join the conversation on Discord