no idea what that is, but I imagine it would be pretty easy to implement with existing modules in llama-index π
"Retrieval Augmented Generation (RAG) enhances the abilities of Large Language
Models (LLMs) by enabling the retrieval of documents into the LLM context to
provide more accurate and relevant responses. Existing RAG solutions do not
focus on queries that may require fetching multiple documents with substantially
different contents. Such queries occur frequently, but are challenging because the
embeddings of these documents may be distant in the embedding space, making it
hard to retrieve them all. This paper introduces Multi-Head RAG (MRAG), a novel
scheme designed to address this gap with a simple yet powerful idea: leveraging
activations of Transformerβs multi-head attention layer, instead of the decoder
layer, as keys for fetching multi-aspect documents. The driving motivation is that
different attention heads can learn to capture different data aspects. Harnessing the
corresponding activations results in embeddings that represent various facets of
data items and queries, improving the retrieval accuracy for complex queries. We
provide an evaluation methodology and metrics, synthetic datasets, and real-world
use cases to demonstrate MRAGβs effectiveness, showing improvements of up
to 20% in relevance over standard RAG baselines. MRAG can be seamlessly
integrated with existing RAG frameworks and benchmarking tools like RAGAS as
well as different classes of data stores."
that sounds incredibly complex to implement, and is only possible with raw custom pytorch models (I think)
My hot take is its not worth the effort, especially when they exlcude comparisons to other lightweight retrieval methods like hybrid retrieval, query rewriting, reranking
It seems the simple directory reader should be modified in order to implement this, what do you think @Logan M ?
Ummm i don't think simple directory reader is what needs to be modified? Isn't the paper more so talking about changes to the embedding model?
what are your thoughts on this logan??
it seems this is needed in order to savethe standard and layer embeddings
they are all orthogonal, thatp's why the comparison against standard rag
I think I gave my thoughts above, its the embedding model that needs to be modified no?
imo doesn't seem worth the effort until someone opensources the approach
ye, with a customembedding
unless you want to implement the embedding model π
ye, the embedding wasnpt that hard
the problem is saving them all
classic research code, built purely to run benchmarks vs. actual usage lol
would need to pull out the actual model/approach from that code and make it runnable as a standalone module.
Then I think what you'd have to do is create a vector store collection per "head" since they are all different embedding spaces (I think?)
that was my initial idea till I came up with that multi embedding requirement, bc this approach seems too rough
for this I would need to use 10 -20 stores
I don't think theres another way to easily support this in llama-index. Some vector stores support multi-vector in the same collection, but llama-index isn't setup to easily take advantage of that
You could write your own custom vector store that wraps both the vector db of your choice and the embedding model itself
oh haha that makes way more sense haha
Yea that could be added, but it will be a very long road to supporting this in vector stores
will this feature request be rejected then?
I see, thanks Logan, there still hope ;D
yea reopened it, but I cant say when itll happen π
thanks ya very much Logan. hope to see this... soon :yao:
Hi Logan, how are ya been?, I would like to know if you have considered working on this feature