Find answers from the community

Updated 5 months ago

So.. it seems there is no way to filter

So.. it seems there is no way to filter nodes down by strings, because all of the metadata search examples do not include string search (e.g. your metadata is an email subject and you want to find one string in the subject). Is that correct? I have tried every metadata approach and have been unsuccessful.

Is there some sort of approach where we can first use keyword search (e.g. a PO number, a string, etc) to narrow down the available nodes and THEN do the vector search? Am I just stupid?
k
L
K
8 comments
I switched to postgres because I saw someone say that the docs are persisted when inserted into postgres; I flagged store_nodes_override=True, and I still cant retrieve them from postgre
@kcramp use a vector db that supports text search, or use some hybrid method like bm25
@Logan M I actually have the same question and am also using Postgres (pgvector). It works GREAT for semantic search. However, I also want to combine this with a keywords search. I see that the vector table contains a "text" column with the raw document contents. I'd imagine there's a way to customize the retriever to also do a keyword search over this column? (ideally both keyword and semantic search separately and then combine rated/ranked results somehow)
yea theres a hybrid mode, using tsv, its kind of meh lol
my current solution is to think of tags and then tag everything so I have standard metadata to filter from, but that isnt ideal
i tried BM25 but I dont understand. it needs nodes, but I cant get the nodes back out of weaviate. it takes 3 hours to process all of my docs, so I cant wait for 3 hours to generate the nodes.
I standard set of fields to filter with is a very good approach imo

yea bm25 is a static encoding. If any docs are added, the entire thing needs to be re-computed.

Its using the rank-bm25 library under the hood, which doesn't provide a way to save/load it. Been meaning to update it to a faster library that supports saving/loading.
qdrant recently released bm42, been meaning to try it
Add a reply
Sign up and join the conversation on Discord