So.. it seems there is no way to filter nodes down by strings, because all of the metadata search examples do not include string search (e.g. your metadata is an email subject and you want to find one string in the subject). Is that correct? I have tried every metadata approach and have been unsuccessful.
Is there some sort of approach where we can first use keyword search (e.g. a PO number, a string, etc) to narrow down the available nodes and THEN do the vector search? Am I just stupid?
I switched to postgres because I saw someone say that the docs are persisted when inserted into postgres; I flagged store_nodes_override=True, and I still cant retrieve them from postgre
@Logan M I actually have the same question and am also using Postgres (pgvector). It works GREAT for semantic search. However, I also want to combine this with a keywords search. I see that the vector table contains a "text" column with the raw document contents. I'd imagine there's a way to customize the retriever to also do a keyword search over this column? (ideally both keyword and semantic search separately and then combine rated/ranked results somehow)
i tried BM25 but I dont understand. it needs nodes, but I cant get the nodes back out of weaviate. it takes 3 hours to process all of my docs, so I cant wait for 3 hours to generate the nodes.
I standard set of fields to filter with is a very good approach imo
yea bm25 is a static encoding. If any docs are added, the entire thing needs to be re-computed.
Its using the rank-bm25 library under the hood, which doesn't provide a way to save/load it. Been meaning to update it to a faster library that supports saving/loading.