Find answers from the community

Updated 5 days ago

I have a markdown file with an image embedded in it. Is there a reader that creates both t

@kapa.ai I have a markdown file with an image embedded in it. Is there a reader that creates both the document and image document with the images being described with a multi modal model?
k
c
6 comments
Please stand by @cmosguy as I search the knowledge base. I normally take about 15 seconds or so. :wave:
----------------------
Based on the knowledge sources provided, there isn't a specific reader in LlamaIndex that automatically creates both document and image documents from a markdown file while describing images with a multi-modal model. However, I can suggest a potential approach using existing components of LlamaIndex to achieve a similar result.

You could use a combination of the following:

  1. A markdown reader to parse the markdown file and extract text and image paths.
  2. A multi-modal LLM to generate descriptions for the images.
  3. A custom process to create separate document and image documents.
Here's a general outline of how you might approach this:

  1. Use a markdown parser to extract text and image paths from your markdown file.
  2. For each image, create an ImageDocument using the SimpleDirectoryReader or by manually creating ImageDocument instances.
  3. Use a multi-modal LLM like OpenAI's GPT-4 Vision to generate descriptions for each image.
  4. Create text documents for the markdown content.
  5. Combine these into a single collection of documents.
Here's a rough example of how you might use the OpenAI multi-modal model to describe images:

Plain Text
from llama_index.core import SimpleDirectoryReader
from llama_index.multi_modal_llms.openai import OpenAIMultiModal

# Assuming you've extracted image paths from your markdown
image_documents = SimpleDirectoryReader("path_to_images").load_data()

openai_mm_llm = OpenAIMultiModal(model="gpt-4-vision-preview", max_new_tokens=300)

response = openai_mm_llm.complete(
    prompt="Describe the images as an alternative text",
    image_documents=image_documents,
)

print(response)


This approach would require some custom coding to integrate with your markdown parsing, but it allows you to leverage LlamaIndex's multi-modal capabilities to describe the images.
For more details on using multi-modal LLMs in LlamaIndex, you can refer to the following resources:

Remember that you'll need to handle the markdown parsing separately, as the provided knowledge sources don't mention a specific markdown reader that integrates with multi-modal image description.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai does the simple director reader return image document objects?
Let me check the knowledge base for you again. One moment! :books:
----------------------
Yes, the SimpleDirectoryReader can return ImageDocument objects when loading image files. This functionality is supported in LlamaIndex, as evidenced by several examples in the provided knowledge sources.

According to the LlamaIndex documentation, the SimpleDirectoryReader can be used to load image data as follows:

Plain Text
from llama_index.core import SimpleDirectoryReader

# put your local directory here
image_documents = SimpleDirectoryReader("./tests/data/").load_data()


This code snippet demonstrates that the SimpleDirectoryReader can load data from a directory, which includes image files, and return them as document objects.

Furthermore, the LlamaIndex API reference shows that when loading image files, the reader returns a list of Document objects, specifically ImageDocument instances for image files.

It's important to note that the SimpleDirectoryReader uses different readers internally based on the file type. For image files, it uses an ImageReader, which creates ImageDocument objects. These ImageDocument objects contain the image data, the image path, and any extracted or generated text associated with the image.

If you need to work with the images as ImageDocument objects, you can access them directly from the list returned by SimpleDirectoryReader's load_data() method.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord