LlamaIndex

Log inLog into community

Find answers from the community

Updated 4 months ago

@Logan M may multihead rag please be

@Logan M may multihead rag please be

At a glance

The post asks if the "multihead rag" feature can be implemented. The comments discuss the Multi-Head RAG (MRAG) approach, which is a novel scheme designed to enable retrieval of multiple documents with substantially different contents for complex queries. Community members discuss the complexity of implementing MRAG, noting that it would require modifications to the embedding model and may not be worth the effort until the approach is open-sourced. There is a discussion around the challenges of integrating MRAG into the llama-index library, such as the need for multiple vector stores to handle the different embedding spaces. The community members express interest in seeing this feature implemented, but acknowledge that it may be a long road to support it in llama-index.

Useful resources

·

@Logan M may multihead rag please be implemented?

L

b

29 comments

no idea what that is, but I imagine it would be pretty easy to implement with existing modules in llama-index 🙂

here it is: https://arxiv.org/pdf/2406.05085

"Retrieval Augmented Generation (RAG) enhances the abilities of Large Language
Models (LLMs) by enabling the retrieval of documents into the LLM context to
provide more accurate and relevant responses. Existing RAG solutions do not
focus on queries that may require fetching multiple documents with substantially
different contents. Such queries occur frequently, but are challenging because the
embeddings of these documents may be distant in the embedding space, making it
hard to retrieve them all. This paper introduces Multi-Head RAG (MRAG), a novel
scheme designed to address this gap with a simple yet powerful idea: leveraging
activations of Transformer’s multi-head attention layer, instead of the decoder
layer, as keys for fetching multi-aspect documents. The driving motivation is that
different attention heads can learn to capture different data aspects. Harnessing the
corresponding activations results in embeddings that represent various facets of
data items and queries, improving the retrieval accuracy for complex queries. We
provide an evaluation methodology and metrics, synthetic datasets, and real-world
use cases to demonstrate MRAG’s effectiveness, showing improvements of up
to 20% in relevance over standard RAG baselines. MRAG can be seamlessly
integrated with existing RAG frameworks and benchmarking tools like RAGAS as
well as different classes of data stores."

that sounds incredibly complex to implement, and is only possible with raw custom pytorch models (I think)

My hot take is its not worth the effort, especially when they exlcude comparisons to other lightweight retrieval methods like hybrid retrieval, query rewriting, reranking

Thanks Logan!

It seems the simple directory reader should be modified in order to implement this, what do you think @Logan M ?

Ummm i don't think simple directory reader is what needs to be modified? Isn't the paper more so talking about changes to the embedding model?

what are your thoughts on this logan??

it seems this is needed in order to savethe standard and layer embeddings

they are all orthogonal, thatp's why the comparison against standard rag

I think I gave my thoughts above, its the embedding model that needs to be modified no?

imo doesn't seem worth the effort until someone opensources the approach

ye, with a customembedding

unless you want to implement the embedding model 😉

there is, but not using the li api https://github.com/spcl/MRAG

ye, the embedding wasnpt that hard

the problem is saving them all

classic research code, built purely to run benchmarks vs. actual usage lol

would need to pull out the actual model/approach from that code and make it runnable as a standalone module.

Then I think what you'd have to do is create a vector store collection per "head" since they are all different embedding spaces (I think?)

that was my initial idea till I came up with that multi embedding requirement, bc this approach seems too rough

for this I would need to use 10 -20 stores

I don't think theres another way to easily support this in llama-index. Some vector stores support multi-vector in the same collection, but llama-index isn't setup to easily take advantage of that

You could write your own custom vector store that wraps both the vector db of your choice and the embedding model itself

wait, when I asked "what are your thoughts on this logan??" I forgot to paste this https://github.com/run-llama/llama_index/issues/10486

oh haha that makes way more sense haha

Yea that could be added, but it will be a very long road to supporting this in vector stores

will this feature request be rejected then?

I see, thanks Logan, there still hope ;D

yea reopened it, but I cant say when itll happen 😅

thanks ya very much Logan. hope to see this... soon :yao:

Hi Logan, how are ya been?, I would like to know if you have considered working on this feature

Add a reply

Sign up and join the conversation on Discord