Find answers from the community

Updated 6 months ago

Hi all, I am trying to get fine-grained

Hi all, I am trying to get fine-grained control over what indexes are returned based on a query. I am thinking of defining a custom similarity score with Weaviate or Elasticsearch. Is it possible to use LlamaIndex for retrieving from a vector database using a custom similarity score? thx.
R
n
18 comments
the best approach is to implement this on the level of the vectordb you plan to use. then llamaindex would just inherit the results.
@Roland Tannous What I was thinking of was to define a custom retriever class, which inherits from ObjectRetriever, and in the retrieve method everything will be done using the vectordb. All indexing and storage will be done outside of LlamaIndex, but the retriever will be a llama index object, which can be integrated into other tools/agents. Is this what you had in mind? Thx btw.
the retrieval engines (weaviate..etc) are completely separate from llamaindex. llamaindex wraps around their libraries. so no you can't change the behavior of the retrieval engine, by just creating a "custom retriever class" in llamaindex, because you still need to wrap around the engines' sdk and you are limited by what they allow you to do.
and unless you are an expert in information retrieval algorithms, i highly recommend you don't venture into this... it won't get anywhere
@Roland Tannous Thanks for your advice/warning πŸ™ .
revisiting this thread
why not approach this differently
instead of trying to change the cosine similarity measure
look at implementing a reranker
could be any existing reranker or simply yours
Thanks for the suggestion @Roland Tannous. What I really want to do is have a similarity score that takes document structure into accout. I want to have a weighted score where similarity with titles and section names are weighted higher than similarity with some text chunk. Also, I want to do this in the initial query and not as a postprocessing step like a reranker.
do a hybrid search
that's the closest thing i can think of.
hybrid is a combination of keyword search and similarity search
the weights are usually 70% similarity, 30% keyword. you can tune those
Otherwise, you can't really side step rerankers. The reason is mostly architectural. bi-encoders vs cross-encoders....etc..
Add a reply
Sign up and join the conversation on Discord