MultiModalVectorStoreIndex
from a ChromaDB collection containing text and image nodes, It ignores the imageNodes. However If I instanciate a VectorStoreIndex
using the same nodes or documents it's correctly retrieving the image nodes. I suppose that it's a bug ?client = chromadb.EphemeralClient( Settings(anonymized_telemetry=False, allow_reset=True) ) chroma_collection = client.create_collection( name="multimodal_collection_test", embedding_function=embedding_function, data_loader=ImageLoader(), ) vector_store = ChromaVectorStore(chroma_collection=chroma_collection) storage_context = StorageContext.from_defaults(vector_store=vector_store) documents = SimpleDirectoryReader(mm_rag_helper.rag_folder_path).load_data() # Nodes processing which returns a List[Union[TextNode, ImageNode] # where ImageNode has text field filled with a description. nodes_lst = compute_nodes( documents=documents, lmm=lmm, #GPT4o prompt_template=useful_prompt_template, embedding_model=embedding_model, ) index = MultiModalVectorStoreIndex( nodes=nodes_lst, embed_model=embedding_model, storage_context=storage_context, is_image_to_text=True, )
embedding_function = OpenAIEmbeddingFunction( api_key=app_settings.azure.embedding_api_key, model_name=app_settings.azure.embedding_model, deployment_id=app_settings.azure.embedding_model, api_type="azure", api_base=app_settings.azure.embedding_uri, api_version=app_settings.openai.api_version, ) lmm = AzureOpenAIMultiModal( engine=app_settings.openai.deployment_id, model=app_settings.openai.deployment_id, temperature=app_settings.rag.llm_config.temperature, azure_endpoint=app_settings.openai.endpoint, api_key=app_settings.openai.api_key, api_version=app_settings.openai.api_version, image_detail="high", max_new_tokens=700, ) embedding_model = AzureOpenAIEmbedding( model=app_settings.azure.embedding_model, deployment_name=app_settings.azure.embedding_model, api_key=app_settings.azure.embedding_api_key, azure_endpoint=app_settings.azure.embedding_uri, api_version=app_settings.openai.api_version, )