I am trying to embed my directory using open clip, but how do I tell what is the text asso

Question

@kapa.ai I am trying to embed my directory using open clip, but how do I tell what is the text associated with the document that it is embedded with?

kapa.ai · Answer

Please stand by @cmosguy as I search the knowledge base. I normally take about 15 seconds or so. :wave:

kapa.ai · Answer

----------------------To answer your question about embedding a directory using OpenCLIP and associating text with the embedded documents in LlamaIndex, I'll refer to the relevant information from the provided knowledge sources.LlamaIndex supports using OpenCLIP for embedding images, but the process of associating text with these embeddings is not directly addressed in the given knowledge sources. However, I can provide some insights based on the available information:OpenCLIP can be used as an embedding function in LlamaIndex, as shown in this example:ChromaMultiModalDemo.ipynbfrom chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction # set default text and image embedding functions
embedding_function = OpenCLIPEmbeddingFunction()When working with documents in LlamaIndex, each document typically has both text content and metadata. The text associated with a document is usually stored in the text attribute of the Document object.In a multi-modal context (which includes images), you might need to handle both text and image data. The exact method for associating text with embedded images depends on how you're structuring your data and using the LlamaIndex API.However, the knowledge sources don't provide specific information on how to directly associate text with OpenCLIP embeddings in the context of a directory of documents. To get a more precise answer, you might need to consult the LlamaIndex documentation or examples specifically related to multi-modal indexing with OpenCLIP.If you're working with a directory of mixed content (text and images), you may need to:

kapa.ai · Answer

Use a SimpleDirectoryReader to load your documents.
Process each document to extract text and image data.
Use OpenCLIP to embed the images.
Associate the original text with the embedded images, possibly by storing both in a custom data structure or by using LlamaIndex's document metadata.

For the most accurate and up-to-date information on this specific use case, I recommend checking the latest LlamaIndex documentation or reaching out to the LlamaIndex community for guidance.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

cmosguy · Answer

@kapa.ai I have the code:embedding_function = OpenCLIPEmbeddingFunction() from chromadb.utils.data_loaders import ImageLoader
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core.indices import MultiModalVectorStoreIndex image_loader = ImageLoader() create client and a new collection
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.get_or_create_collection( "multimodal_collection", embedding_function=embedding_function, data_loader=image_loader,
) load documents
documents = SimpleDirectoryReader("./guide").load_data() set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, embed_model=embedding_large,
)
when i do a : retriever = index.as_retriever(similarity_top_k=50)
retrieval_results = retriever.retrieve("what is the in memory model")it does not retrieve any of the images that have text in, how can I check that the image is getting embedded proper. lets say my image is ./guide/foobar.png how can I check that any text or embedding associated as this as adocument is working

kapa.ai · Answer

Let me check the knowledge base for you again. One moment! :books:

kapa.ai · Answer

----------------------To address your question about checking if images are being properly embedded and retrieved in your LlamaIndex setup, I'll provide some guidance based on the available information from the knowledge sources.First, it's important to note that the code you've provided is using a multi-modal approach, which should handle both text and images. However, you're experiencing issues with image retrieval. Let's break down the problem and suggest some steps to check and potentially resolve the issue:Verify Image Loading:Make sure that your images are actually being loaded into the index. You can do this by checking the documents variable after loading: documents = SimpleDirectoryReader("./guide").load_data() for doc in documents: if hasattr(doc, 'image'): print(f"Image loaded: {doc.metadata['file_path']}") Check Image Embedding:To verify that your images are being embedded properly, you can try to embed a single image directly using the OpenCLIP embedding function: from PIL import Image image_path = "./guide/foobar.png" image = Image.open(image_path) image_embedding = embedding_function.encode_image(image) print(f"Image embedding shape: {image_embedding.shape}") If this works, it confirms that OpenCLIP can embed your image.Inspect the Index:After creating the index, you can inspect its contents to ensure images are included: for node in index.docstore.docs.values(): if hasattr(node, 'image'): print(f"Image in index: {node.metadata['file_path']}") Use Multi-Modal Retriever:Your current code is using a standard retriever, which might not be optimized for multi-modal retrieval. Instead, try using a multi-modal retriever:

kapa.ai · Answer

retriever = index.as_retriever(similarity_top_k=50) retrieval_results = retriever.retrieve("what is the in memory model") for result in retrieval_results: if hasattr(result.node, 'image'): print(f"Retrieved image: {result.node.metadata['file_path']}") else: print(f"Retrieved text: {result.node.text[:100]}...") Check Image-to-Image Retrieval:If you specifically want to test image retrieval, you can use the image-to-image retrieval method: retriever = index.as_retriever(image_similarity_top_k=5) image_retrieval_results = retriever.image_to_image_retrieve("./guide/foobar.png") for result in image_retrieval_results: print(f"Retrieved image: {result.node.metadata['file_path']}") These steps should help you verify that your images are being properly embedded and retrieved. If you're still experiencing issues, it might be worth checking if your MultiModalVectorStoreIndex is correctly configured to handle both text and images.For more detailed information on multi-modal indexing and retrieval in LlamaIndex, you can refer to these resources:Multi-Modal RAG using Nomic Embed and AnthropicImage to Image Retrieval using CLIP embeddingThese examples provide more context on how to work with multi-modal data in LlamaIndex and might help you troubleshoot your specific setup.----------------------Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:----------------------

cmosguy · Answer

@kapa.ai AttributeError: 'OpenCLIPEmbeddingFunction' object has no attribute 'encode_image'

kapa.ai · Answer

Let me check the knowledge base for you again. One moment! :books:

kapa.ai · Answer

----------------------I apologize for the confusion in my previous response. The error you're encountering suggests that the OpenCLIPEmbeddingFunction doesn't have a direct encode_image method as I incorrectly suggested. Let's correct this and provide a more accurate way to check your image embeddings using the OpenCLIP embedding function in LlamaIndex.Based on the available knowledge sources, here's a more appropriate way to check if your images are being embedded properly:First, make sure you're using the correct import for the OpenCLIP embedding function: from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction To check if an individual image is being embedded correctly, you can use the embedding function directly: import base64 from PIL import Image from io import BytesIO # Initialize the embedding function embedding_function = OpenCLIPEmbeddingFunction() # Load and prepare the image image_path = "./guide/foobar.png" with open(image_path, "rb") as image_file: image_data = base64.b64encode(image_file.read()).decode() # Create a list with a single image (OpenCLIP expects a list) images = [image_data] # Get the embedding image_embedding = embedding_function(images) print(f"Image embedding shape: {len(image_embedding[0])}") This approach uses the OpenCLIP embedding function as it's intended to be used with Chroma, which expects base64-encoded image data.To verify that your images are being properly embedded and included in your index, you can modify your document loading and indexing process to print out information about each document:```pythondocuments = SimpleDirectoryReader("./guide").load_data()

kapa.ai · Answer

for doc in documents:
if isinstance(doc, ImageNode):
print(f"Image document found: {doc.image_path}")
else:
print(f"Text document found: {doc.text[:50]}...")

Create the index

index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
embed_model=embedding_function,
)

Plain Text

If you're still not seeing your images being retrieved, you might need to ensure that your `SimpleDirectoryReader` is configured to load images. You can do this by specifying the file extensions to include:

python
documents = SimpleDirectoryReader("./guide", file_extractor={
".png": "image",
".jpg": "image",
".jpeg": "image"
}).load_data()
```

Remember, the exact implementation might vary depending on the version of LlamaIndex you're using. If you continue to face issues, I recommend checking the latest LlamaIndex documentation or reaching out to the LlamaIndex community for the most up-to-date guidance on working with multi-modal data and OpenCLIP embeddings.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

cmosguy · Answer

Failed to load file /guide/dmobject.png with error: 'str' object has no attribute 'load_data'. Skipping... @kapa.ai

kapa.ai · Answer

Let me check the knowledge base for you again. One moment! :books:

kapa.ai · Answer

----------------------I understand you're encountering an error while trying to load an image file using LlamaIndex. Let's analyze this error and provide a solution based on the information available in the knowledge sources.The error message you're seeing suggests that the SimpleDirectoryReader is trying to call a load_data method on a string object, which doesn't exist. This typically happens when the file reader is not properly configured to handle image files.To resolve this issue, you need to ensure that you're using the correct file extractor for image files. Here's how you can modify your SimpleDirectoryReader configuration to properly handle image files:from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import ImageReader # Create an instance of ImageReader
image_reader = ImageReader() # Configure SimpleDirectoryReader with the ImageReader for .png files
documents = SimpleDirectoryReader( input_dir="./guide", file_extractor={ ".png": image_reader }
).load_data()This configuration tells the SimpleDirectoryReader to use the ImageReader specifically for .png files. The ImageReader is designed to handle image files and will properly load them into the index.If you're dealing with multiple image formats, you can extend the file_extractor dictionary:file_extractor={ ".png": image_reader, ".jpg": image_reader, ".jpeg": image_reader
}After making these changes, the SimpleDirectoryReader should be able to load your image files without the error you're encountering.For more detailed information on using SimpleDirectoryReader and handling different file types, you can refer to the LlamaIndex documentation on SimpleDirectoryReader.

kapa.ai · Answer

If you continue to face issues after implementing these changes, it might be helpful to check if the file path is correct and if the image file is accessible. Also, ensure that you have the necessary dependencies installed for handling image files in your Python environment.

SimpleDirectoryReader Documentation
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

cmosguy · Answer

@kapa.ai the issue was that there was no recursive specified in my simle director reader and the images were not getting indexed properly

kapa.ai · Answer

Let me check the knowledge base for you again. One moment! :books:

kapa.ai · Answer

----------------------I understand that you've identified the issue with your image indexing. You're correct that not specifying the recursive parameter in your SimpleDirectoryReader can lead to images not being indexed properly, especially if they are located in subdirectories. Let's address this based on the information from our knowledge sources.To fix this issue and ensure that all images in subdirectories are properly indexed, you should modify your SimpleDirectoryReader configuration to include the recursive parameter. Here's how you can do it:from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import ImageReader # Create an instance of ImageReader
image_reader = ImageReader() # Configure SimpleDirectoryReader with the ImageReader for image files and set recursive to True
documents = SimpleDirectoryReader( input_dir="./guide", recursive=True, file_extractor={ ".png": image_reader, ".jpg": image_reader, ".jpeg": image_reader }
).load_data()By setting recursive=True, the SimpleDirectoryReader will search through all subdirectories within the specified input_dir, ensuring that all image files are found and indexed properly.This approach is supported by the documentation on SimpleDirectoryReader. According to the SimpleDirectoryReader Usage guide, you can enable recursive searching in subdirectories by setting recursive=True:SimpleDirectoryReader(input_dir="path/to/directory", recursive=True)With this modification, your image files should now be properly indexed, regardless of their location within the directory structure. This should resolve the issue you were facing with images not being indexed correctly.

kapa.ai · Answer

Remember to re-run your indexing process after making this change to ensure all your images are included in the index. If you encounter any further issues or have additional questions, please don't hesitate to ask.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

cmosguy · Answer

@kapa.ai the open clip is not working very well can i somehow use an open ai multimodel intead for embedding and indexing the images?

kapa.ai · Answer

Let me check the knowledge base for you again. One moment! :books:

kapa.ai · Answer

----------------------Yes, you can use OpenAI's multimodal model for embedding and indexing images instead of OpenCLIP. LlamaIndex supports OpenAI's multimodal capabilities, which can potentially provide better performance for your use case. Let me guide you through how to set this up based on the information from our knowledge sources.To use OpenAI's multimodal model for embedding and indexing images, you'll need to use the OpenAIMultiModal class. Here's how you can modify your code to use it:First, import the necessary classes:from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core.indices import MultiModalVectorStoreIndexSet up the OpenAI multimodal model:openai_mm_llm = OpenAIMultiModal( model="gpt-4-vision-preview", max_new_tokens=300,
)Load your documents, including images:documents = SimpleDirectoryReader("./guide", recursive=True).load_data()Create the multimodal index:index = MultiModalVectorStoreIndex.from_documents( documents, openai_mm_llm=openai_mm_llm,
)Use the index for retrieval:retriever = index.as_retriever(similarity_top_k=5)
retrieval_results = retriever.retrieve("your query here")This setup uses OpenAI's GPT-4 Vision model, which can handle both text and images. It should provide better performance for multimodal tasks compared to OpenCLIP.Keep in mind that using OpenAI's model will require an OpenAI API key and may incur costs based on your usage.For more detailed information on using OpenAI's multimodal capabilities with LlamaIndex, you can refer to these resources:

kapa.ai · Answer

OpenAI Multi-Modal exampleMulti-Modal LLM ModulesThese resources provide additional context and examples for working with OpenAI's multimodal models in LlamaIndex.If you need any further assistance or have more questions about implementing this approach, please feel free to ask.----------------------Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:----------------------

cmosguy · Answer

@kapa.ai i am using azure multi modal instead how can i use that?

kapa.ai · Answer

Let me check the knowledge base for you again. One moment! :books:

kapa.ai · Answer

----------------------Certainly! I can help you use Azure's multimodal capabilities with LlamaIndex instead of OpenAI's. Azure OpenAI Service provides similar functionality to OpenAI, but with the added benefits of Azure's enterprise-grade security and compliance features. Let's walk through how to set this up using the information from our knowledge sources.To use Azure's multimodal model for embedding and indexing images in LlamaIndex, you'll need to use the AzureOpenAIMultiModal class. Here's how you can modify your code:First, import the necessary class:from llama_index.multi_modal_llms.azure_openai import AzureOpenAIMultiModalSet up the Azure OpenAI multimodal model:azure_openai_llm = AzureOpenAIMultiModal( engine="my-gpt-4o-mini", # Replace with your deployed model name model="gpt-4o-mini", api_key="your_azure_api_key", azure_endpoint="<https://YOUR_URL.openai.azure.com/",> api_version="2024-02-15-preview", max_new_tokens=300,
)Load your documents, including images:from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./guide", recursive=True).load_data()Create the multimodal index:from llama_index.core.indices import MultiModalVectorStoreIndex index = MultiModalVectorStoreIndex.from_documents( documents, openai_mm_llm=azure_openai_llm,
)Use the index for retrieval:retriever = index.as_retriever(similarity_top_k=5)
retrieval_results = retriever.retrieve("your query here")Make sure to replace "your_azure_api_key", "YOUR_URL", and "my-gpt-4o-mini" with your actual Azure OpenAI API key, endpoint URL, and deployed model name respectively.Additionally, you'll need to set up your Azure environment variables:```pythonimport os

kapa.ai · Answer

os.environ["AZURE_OPENAI_API_KEY"] = "your_azure_api_key"
os.environ["AZURE_OPENAI_ENDPOINT"] = "
os.environ["OPENAI_API_VERSION"] = "2024-02-15-preview"
```

Remember that to use Azure OpenAI Service, you need to:

Set up an Azure subscription
Apply for access to Azure OpenAI Service
Create a resource in the Azure portal
Deploy a model in Azure OpenAI Studio

For more detailed information on using Azure OpenAI's multimodal capabilities with LlamaIndex, you can refer to this resource:

Azure OpenAI Multi-Modal

This guide provides additional context and examples for working with Azure OpenAI's multimodal models in LlamaIndex.

If you need any further assistance or have more questions about implementing this approach with Azure, please don't hesitate to ask.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

cmosguy · Answer

@kapa.ai

ok how I am going to use azure open llm to describe the image;

from llama_index.core.llms import (
ChatMessage,
ImageBlock,
TextBlock,
MessageRole,
)

msg = ChatMessage(
role=MessageRole.USER,
blocks=[
TextBlock(text="Describe the images as an alternative text"),
ImageBlock(image=image_document.image_resource.data),
],
)

response = azure_openai_llm.chat(messages=[msg])

but then how do I take this text and associate as a document that can be loaded into a vector store?

kapa.ai · Answer

Let me check the knowledge base for you again. One moment! :books:

kapa.ai · Answer

----------------------To use Azure OpenAI's multimodal model to describe an image and then associate the description with a document that can be loaded into a vector store, you can follow these steps:First, use the Azure OpenAI multimodal model to generate a description of the image:from llama_index.multi_modal_llms.azure_openai import AzureOpenAIMultiModal
from llama_index.core.llms import ( ChatMessage, ImageBlock, TextBlock, MessageRole,
) azure_openai_llm = AzureOpenAIMultiModal( engine="my-gpt-4o-mini", # Replace with your deployed model name model="gpt-4o-mini", max_new_tokens=300,
) msg = ChatMessage( role=MessageRole.USER, blocks=[ TextBlock(text="Describe the images as an alternative text"), ImageBlock(image=image_document.image_resource.data), ],
) response = azure_openai_llm.chat(messages=[msg])After getting the description, create a new document that includes both the image and its description:from llama_index.core.schema import Document, ImageDocument # Create a new document with the image description
image_description = response.content # Assuming the response content is the description
new_document = Document( text=image_description, metadata={ "image_path": image_document.metadata.get("file_path", ""), "original_image_data": image_document.image_resource.data }
)Now you can use this new document to create a vector store index:```pythonfrom llama_index.core import VectorStoreIndexfrom llama_index.core import StorageContextfrom llama_index.vector_stores.qdrant import QdrantVectorStoreimport qdrant_clientCreate a Qdrant vector store client = qdrant_client.QdrantClient(path="qdrant_db")vector_store = QdrantVectorStore(client=client, collection_name="image_descriptions")

kapa.ai · Answer

Create a storage context storage_context = StorageContext.from_defaults(vector_store=vector_store)Create the index with the new document index = VectorStoreIndex.from_documents( [new_document], storage_context=storage_context,)`` This approach allows you to:
1. Use Azure OpenAI's multimodal model to generate a description of the image.
2. Create a new document that contains both the image description and metadata about the original image.
3. Store this document in a vector store index, which can be used for retrieval later. Remember to replace "my-gpt-4o-mini"` with your actual deployed model name in Azure OpenAI.For more information on using Azure OpenAI with LlamaIndex, you can refer to the Azure OpenAI Multi-Modal example.This approach allows you to associate the image description with a document that can be easily loaded into a vector store and used for retrieval tasks.----------------------Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:----------------------

cmosguy · Answer

@kapa.ai if I have a markdown that I am reading in as an image and that markdown refers to an image file how do i associate it in the same document?

kapa.ai · Answer

Let me check the knowledge base for you again. One moment! :books:

kapa.ai · Answer

----------------------To associate a markdown file that refers to an image file within the same document in LlamaIndex, you can use a combination of the SimpleDirectoryReader and custom document processing. Here's a step-by-step approach to achieve this:First, use the SimpleDirectoryReader to load both markdown and image files:from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import ImageReader image_reader = ImageReader() documents = SimpleDirectoryReader( input_dir="./your_directory", recursive=True, file_extractor={ ".md": "markdown", ".png": image_reader, ".jpg": image_reader, ".jpeg": image_reader }
).load_data()After loading the documents, you'll need to process them to associate the markdown content with its referenced images. Here's a function to do that:```pythonimport refrom llama_index.core.schema import Document, ImageDocument

kapa.ai · Answer

def associate_markdown_with_images(documents): markdown_docs = {} image_docs = {} # Separate markdown and image documents for doc in documents: if isinstance(doc, ImageDocument): image_docs[doc.metadata['file_path']] = doc else: markdown_docs[doc.metadata['file_path']] = doc # Process markdown documents to find image references for md_path, md_doc in markdown_docs.items(): # Regular expression to find image references in markdown image_refs = re.findall(r'![.?]((.?))', md_doc.text) associated_images = [] for img_ref in image_refs: img_path = os.path.join(os.path.dirname(md_path), img_ref) if img_path in image_docs: associated_images.append(image_docs[img_path]) # Create a new document with associated images new_doc = Document( text=md_doc.text, metadata={ **md_doc.metadata, "associated_images": [img.metadata['file_path'] for img in associated_images] } ) new_doc.metadata['image_documents'] = associated_images yield new_docProcess the documents processed_documents = list(associate_markdown_with_images(documents))3. Now you can use these processed documents to create your index:pythonfrom llama_index.core import VectorStoreIndexindex = VectorStoreIndex.from_documents(processed_documents)`` This approach allows you to:
1. Load both markdown and image files using SimpleDirectoryReader`.Process the loaded documents to associate markdown content with referenced images.Create new documents that contain both the markdown text and references to the associated images.Use these processed documents to create your index.

kapa.ai · Answer

The resulting documents will have the markdown text in their text field, and the associated image documents will be stored in the image_documents field of the metadata. This allows you to access both the markdown content and the related images when retrieving documents from the index.

For more information on working with different document types in LlamaIndex, you can refer to the SimpleDirectoryReader documentation.

Remember to adjust the file paths and regular expressions as needed to match your specific markdown and image file structure.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

cmosguy · Answer

@kapa.ai i have the code:from llama_index.core import SimpleDirectoryReader
from llama_index.core.schema import Document, ImageDocument
from llama_index.readers.file import ImageReader image_reader = ImageReader() documents = SimpleDirectoryReader( input_dir="./your_directory", recursive=True, file_extractor={ ".md": "markdown", ".png": image_reader, ".jpg": image_reader, ".jpeg": image_reader, ".gif": image_reader, },
).load_data() def associate_markdown_with_images(documents): markdown_docs = {} image_docs = {} for doc in documents: if isinstance(doc, ImageDocument): image_docs[doc.metadata["file_path"]] = doc else: markdown_docs[doc.metadata["file_path"]] = doc for md_path, md_doc in markdown_docs.items(): # Regular expression to find image references in markdown image_refs = re.findall(r"![.?]((.?))", md_doc.text) associated_images = [] for img_ref in image_refs: img_path = os.path.join(os.path.dirname(md_path), img_ref) if img_path in image_docs: associated_images.append(image_docs[img_path]) new_doc = Document( text=md_doc.text, metadata={ **md_doc.metadata, "associated_images": [img.metadata["file_path"] for img in associated_images], }, ) new_doc.metadata["image_documents"] = associated_images yield new_doc processed_documents = list(associate_markdown_with_images(documents))need explain imagefrom llama_index.core.llms import ChatMessage, ImageBlock, MessageRole, TextBlockmsg = ChatMessage( role=MessageRole.USER, blocks=[ TextBlock(text="Describe the images as an alternative text"), ImageBlock(image=image_document.image_resource.data), ],)response = azure_openai_llm.chat(messages=[msg])

kapa.ai · Answer

Let me check the knowledge base for you again. One moment! :books:

kapa.ai · Answer

----------------------Based on the code you've provided and your request to explain the image, I'll guide you on how to integrate the image description functionality into your existing code using Azure OpenAI's multimodal capabilities.First, let's modify your associate_markdown_with_images function to include image descriptions:```pythonfrom llama_index.multi_modal_llms.azure_openai import AzureOpenAIMultiModalfrom llama_index.core.llms import ChatMessage, ImageBlock, MessageRole, TextBlockInitialize Azure OpenAI multimodal model azure_openai_llm = AzureOpenAIMultiModal( engine="your-deployed-model-name", # Replace with your actual model name model="gpt-4o-mini", max_new_tokens=300,)def associate_markdown_with_images(documents): markdown_docs = {} image_docs = {} for doc in documents: if isinstance(doc, ImageDocument): image_docs[doc.metadata["file_path"]] = doc else: markdown_docs[doc.metadata["file_path"]] = doc for md_path, md_doc in markdown_docs.items(): image_refs = re.findall(r"![.?]((.?))", md_doc.text) # Fixed regex associated_images = [] for img_ref in image_refs: img_path = os.path.join(os.path.dirname(md_path), img_ref) if img_path in image_docs: img_doc = image_docs[img_path] # Generate image description msg = ChatMessage( role=MessageRole.USER, blocks=[ TextBlock(text="Describe the image as an alternative text"), ImageBlock(image=img_doc.image_resource.data), ], ) response = azure_openai_llm.chat(messages=[msg]) img_description = response.content

kapa.ai · Answer

associated_images.append({ "image_doc": img_doc, "description": img_description }) new_doc = Document( text=md_doc.text, metadata={ **md_doc.metadata, "associated_images": [ { "file_path": img["image_doc"].metadata["file_path"], "description": img["description"] } for img in associated_images ], }, ) new_doc.metadata["image_documents"] = [img["image_doc"] for img in associated_images] yield new_docProcess the documents processed_documents = list(associate_markdown_with_images(documents))This modified version of the function does the following: 1. It initializes the Azure OpenAI multimodal model.
2. For each image referenced in a markdown file, it generates a description using the Azure OpenAI model.
3. It stores both the image document and its description in the `associated_images` list.
4. The new document's metadata includes both the file paths and descriptions of associated images. Now, when you create your index with these processed documents, each document will contain:
- The original markdown text
- References to associated image files
- Descriptions of those images generated by the Azure OpenAI model You can use this enriched data for more comprehensive indexing and retrieval. For example, you could search not just the markdown text, but also the image descriptions. To use this in your vector store index:pythonfrom llama_index.core import VectorStoreIndexindex = VectorStoreIndex.from_documents(processed_documents)```

kapa.ai · Answer

This approach allows you to associate markdown content with both the referenced images and their AI-generated descriptions, providing a richer context for your documents in the index.

Remember to replace "your-deployed-model-name" with your actual Azure OpenAI deployed model name, and ensure you have the necessary permissions and API keys set up for Azure OpenAI.

For more information on using Azure OpenAI with LlamaIndex, you can refer to the Azure OpenAI Multi-Modal example.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Find answers from the community

I am trying to embed my directory using open clip, but how do I tell what is the text asso

Create the index

Create a Qdrant vector store

Create a storage context

Create the index with the new document

Process the documents

Initialize Azure OpenAI multimodal model

Process the documents