Hello,

VValentin

Hello,
I just found that when I implement a MultiModalVectorStoreIndex from a ChromaDB collection containing text and image nodes, It ignores the imageNodes. However If I instanciate a VectorStoreIndex using the same nodes or documents it's correctly retrieving the image nodes. I suppose that it's a bug ?

6 comments

LLogan M

what was the code?

VValentin

Here you can find my code using MultiModalVectorStoreIndex :

Plain Text

client = chromadb.EphemeralClient(
    Settings(anonymized_telemetry=False, allow_reset=True)
)
chroma_collection = client.create_collection(
                name="multimodal_collection_test",
                embedding_function=embedding_function,
                data_loader=ImageLoader(),
)
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
 storage_context = StorageContext.from_defaults(vector_store=vector_store)


documents = SimpleDirectoryReader(mm_rag_helper.rag_folder_path).load_data()

# Nodes processing which returns a List[Union[TextNode, ImageNode] 
# where ImageNode has text field filled with a description.
nodes_lst  = compute_nodes(
            documents=documents,
            lmm=lmm,  #GPT4o
            prompt_template=useful_prompt_template,
            embedding_model=embedding_model,
        )

index = MultiModalVectorStoreIndex(
            nodes=nodes_lst,
            embed_model=embedding_model,
            storage_context=storage_context,
            is_image_to_text=True,
        )

VValentin

And here is the models code:

Plain Text

embedding_function = OpenAIEmbeddingFunction(
    api_key=app_settings.azure.embedding_api_key,
    model_name=app_settings.azure.embedding_model,
    deployment_id=app_settings.azure.embedding_model,
    api_type="azure",
    api_base=app_settings.azure.embedding_uri,
    api_version=app_settings.openai.api_version,
)
lmm = AzureOpenAIMultiModal(
    engine=app_settings.openai.deployment_id,
    model=app_settings.openai.deployment_id,
    temperature=app_settings.rag.llm_config.temperature,
    azure_endpoint=app_settings.openai.endpoint,
    api_key=app_settings.openai.api_key,
    api_version=app_settings.openai.api_version,
    image_detail="high",
    max_new_tokens=700,
)
embedding_model = AzureOpenAIEmbedding(
    model=app_settings.azure.embedding_model,
    deployment_name=app_settings.azure.embedding_model,
    api_key=app_settings.azure.embedding_api_key,
    azure_endpoint=app_settings.azure.embedding_uri,
    api_version=app_settings.openai.api_version,
)

LLogan M

It didn't ignore the image nodes, but you didn't provide an image vector store. (They need to be separete due to the different embeddings)

LLogan M

Here's an example
https://docs.llamaindex.ai/en/stable/examples/multi_modal/multi_modal_retrieval/#build-multi-modal-vector-store-using-text-and-image-embeddings-under-different-collections

VValentin

I've tested also this solution with ChromaDB but It didn't retrieve any imageNode neither. By the way, I wanted to only use a text embedding model and use it on images summaries to retrieve the original images.

Add a reply

Find answers from the community

Hello,