There is a multimodal index/retriever for this
The as_retriever() makes a mulitmodal retriever, which has a text_retrieve() and image_to_text_retrieve() functions, among others
but does it do it recursively?
as if my node list contains IndexNode
does it fetch the mapped ImageNode
and reterive it?
all retrievers do that by default, its baked into the BaseRetriever class
cool will try it, and reviste this thread if it doesn't work
by the way, I also what to chat over this kind of usecase, but have noticed that there is no ChatEngine specific for this?
does the ContextChatEngine work in the above case?
Hmmm yea there is no multi-modal chat engine just yet -- you'd have to make your own loop using llm calls
okay, i will try to contribute to the library by writing one, but i would be needing some guidance
I am getting the following error - MultiModalVectorIndexRetriever
object has no attribute object_map
hmm.. maybe that wasn't updated to handle this recursive business
Here is the code
mm_vector_store_index = MultiModalVectorStoreIndex(nodes=stored_nodes, image_vector_store=None)
mm_vector_retriever = mm_vector_store_index.as_retriever(similarity_top_k=2, image_similarity_top_k=2)
mm_vector_retriever.retrieve("What are main paradigm of RAG?")
also i think that this wont be working in what I am trying to achiever
I am not storing the images separately
and i do not want to generate the embeddings for the images
i am generating the summary for the images, and linking them to the actual images using the IndexNode
and I have already achieve that using LLama Index, whereas he has done in langchain
but there are couple of limitations to this approach, which I am trying to solve using llama-index
and hence i posted my first question
maybe you can use metadata filtering to filter out images vs text
if you attach metadata to your nodes
well filtering is after the retrieval stage
i am thinking, is there a way, to seperate out nodes and run similarity over two set of nodes
no, filtering is before actually
thats how vector dbs implement it -- apply a filter, then perform similarity search
so then i need to run two retrievals, one only for text nodes and one for text node linked with image node?
Yea, thats how you would retrieve text vs. images seperately
metadatafilters = MetadataFilters(filters=[MetadataFilter(key="type", value="image",FilterOperator=FilterOperator.EQ)])
vector_retriever = vector_store_index.as_retriever(similarity_top_k = 2, filters=metadatafilters)
is there a better way to write MetadataFilters, I couldn't find one