The post discusses the capabilities of the llama-index library and how it compares to Langchain's retriever functionality. Community members are exploring the use of Langchain's OpenSearchVectorSearch and llama-index's OpenSearch integration, and discussing the pros and cons of each approach. Some community members have experimented with adding filtering capabilities to llama-index, and one member has submitted a pull request to improve the filtering support. The discussion also touches on the challenges of using OpenSearch's KNN filtering and the tradeoffs between different approaches.
Another reason to use it is that AWS has an OpenSearch service, and afaik it's the only vector store db I can use while keeping my company's legal and security depts happy. (edit: and satisfies my other requirements)
@Logan M I did get a basic boolean filter to work by small edits to llama-index/vector_stores/opensearch.py, but comparing functionality in more depth I think a wrapper is a better option until llama-index implements something more sophisticated.
Yea the vector store integrations are mostly community driven. Feel free to make a PR. Sadly opensearch is barely used (at least judging from discord/github issues), so it's a little barebones at the moment
As it turns out, OpenSearch's KNN filtering is applied after the k-results are retrieved anyway, and as such it would be just about as easy to filter the response instead of asking for a filtered response. Their Lucene engine has pre-filtering, but it only supports up to dimension 1024
tl;dr exact match filtering might not behave as expected unless you are using "Script Scoring" or "Painless Scripting", but those are not as scalable / flexible as the approximate-Knn to which only a "boolean" filter may be applied.