Below is what llama index provides for

At a glance

The post discusses the capabilities of the llama-index library and how it compares to Langchain's retriever functionality. Community members are exploring the use of Langchain's OpenSearchVectorSearch and llama-index's OpenSearch integration, and discussing the pros and cons of each approach. Some community members have experimented with adding filtering capabilities to llama-index, and one member has submitted a pull request to improve the filtering support. The discussion also touches on the challenges of using OpenSearch's KNN filtering and the tradeoffs between different approaches.

Useful resources

sskittythecat

Below is what llama-index provides for. So if I wrapped Langchain's equivalent in a class that looks like that one...

Attachment

15 comments

LLogan M

i'm not 100% sure how the langchain retriever works

Does it return text chunks that match a given query? Then a custom retriever makes sense

LLogan M

We could maybe add wrappers for langchain stuff, but I'm curious why you arent using llama-index for retrieval ?

sskittythecat

filters

sskittythecat

Here's langchain's output from a OpenSearchVectorSearch object

Attachment

sskittythecat

It looks like llama-index almost supports at least one kind of filter though. Here's llama-index's query

Attachment

sskittythecat

Here is langchain's

Attachment

sskittythecat

I'm just going to stuff something in there and remove the validation about filtering being unimplemented and see what happens

sskittythecat

Definitely worth noting though that llama-index is behind langchain in OpenSearch support.

sskittythecat

Another reason to use it is that AWS has an OpenSearch service, and afaik it's the only vector store db I can use while keeping my company's legal and security depts happy. (edit: and satisfies my other requirements)

sskittythecat

@Logan M I did get a basic boolean filter to work by small edits to llama-index/vector_stores/opensearch.py, but comparing functionality in more depth I think a wrapper is a better option until llama-index implements something more sophisticated.

LLogan M

Yea the vector store integrations are mostly community driven. Feel free to make a PR. Sadly opensearch is barely used (at least judging from discord/github issues), so it's a little barebones at the moment

sskittythecat

As it turns out, OpenSearch's KNN filtering is applied after the k-results are retrieved anyway, and as such it would be just about as easy to filter the response instead of asking for a filtered response. Their Lucene engine has pre-filtering, but it only supports up to dimension 1024

sskittythecat

Or alternatively, there is a brute force exact Knn that lets you pre-filter, but doesn't scale well

sskittythecat

More details https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/

tl;dr exact match filtering might not behave as expected unless you are using "Script Scoring" or "Painless Scripting", but those are not as scalable / flexible as the approximate-Knn to which only a "boolean" filter may be applied.

sskittythecat

I submitted a PR to support filtering, see #🙌contributing

Add a reply

Find answers from the community

Below is what llama index provides for