LlamaIndex

Log inLog into community

Find answers from the community

Updated 12 months ago

So here is the scenario that I am trying

So here is the scenario that I am trying

At a glance

·

So here is the scenario that I am trying to accomplish, I have a pdf containing text + images + tables. I need to develop a RAG which is needs to retrieve images along with the text based on the query relevance.

I can achieve this by creating TextNodes, ImageNodes and IndexNode and then using RecursiveRetriever to retrieve the nodes along with the images.
However, this approach has a problem where in, if the there are more TextNodes with the relevant text (more then similarity_top_k) then the ImageNode wont be retrieved.
To avoid this, is it possible to do some workaround (or have a feature in the library) such that the RecursiveRetriever retrieves TextNodes and ImageNodes separately along with the scores, so that as a user I can decide weather to pass just the TextNodes or TextNode + ImageNode to the LLM in its context.

This use-case is an important one IMO, and I feel that this should be built in the library, I would love to hear some discussions on this and more than happy to contribute if the need araises.

L

d

36 comments

There is a multimodal index/retriever for this

https://docs.llamaindex.ai/en/stable/examples/multi_modal/multi_modal_pdf_tables.html#experiment-1-retrieving-relevant-images-pdf-pages-and-sending-them-to-gpt4-v-to-respond-to-queries

The as_retriever() makes a mulitmodal retriever, which has a text_retrieve() and image_to_text_retrieve() functions, among others

but does it do it recursively?

as if my node list contains IndexNode does it fetch the mapped ImageNode and reterive it?

yes

all retrievers do that by default, its baked into the BaseRetriever class

cool will try it, and reviste this thread if it doesn't work

by the way, I also what to chat over this kind of usecase, but have noticed that there is no ChatEngine specific for this?

does the ContextChatEngine work in the above case?

Hmmm yea there is no multi-modal chat engine just yet -- you'd have to make your own loop using llm calls

okay, i will try to contribute to the library by writing one, but i would be needing some guidance

I am getting the following error - MultiModalVectorIndexRetriever object has no attribute object_map

hmm.. maybe that wasn't updated to handle this recursive business

Here is the code

Plain Text

mm_vector_store_index = MultiModalVectorStoreIndex(nodes=stored_nodes, image_vector_store=None)
mm_vector_retriever = mm_vector_store_index.as_retriever(similarity_top_k=2, image_similarity_top_k=2)
mm_vector_retriever.retrieve("What are main paradigm of RAG?")

also i think that this wont be working in what I am trying to achiever

I am not storing the images separately

and i do not want to generate the embeddings for the images

i am generating the summary for the images, and linking them to the actual images using the IndexNode

I am trying to do what Lance Martin explained in this video -
https://www.youtube.com/watch?v=Rcqy92Ik6Uo

and I have already achieve that using LLama Index, whereas he has done in langchain

but there are couple of limitations to this approach, which I am trying to solve using llama-index

and hence i posted my first question

maybe you can use metadata filtering to filter out images vs text

if you attach metadata to your nodes

well filtering is after the retrieval stage

i am thinking, is there a way, to seperate out nodes and run similarity over two set of nodes

no, filtering is before actually

thats how vector dbs implement it -- apply a filter, then perform similarity search

so then i need to run two retrievals, one only for text nodes and one for text node linked with image node?

Yea, thats how you would retrieve text vs. images seperately

got it

its working

Plain Text

metadatafilters = MetadataFilters(filters=[MetadataFilter(key="type", value="image",FilterOperator=FilterOperator.EQ)])
vector_retriever = vector_store_index.as_retriever(similarity_top_k = 2, filters=metadatafilters)

is there a better way to write MetadataFilters, I couldn't find one

That is the way to do it

Add a reply

Sign up and join the conversation on Discord