from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction # set default text and image embedding functions embedding_function = OpenCLIPEmbeddingFunction()
text
attribute of the Document object.embedding_function = OpenCLIPEmbeddingFunction() from chromadb.utils.data_loaders import ImageLoader from llama_index.core import SimpleDirectoryReader, StorageContext from llama_index.core.indices import MultiModalVectorStoreIndex image_loader = ImageLoader() create client and a new collection chroma_client = chromadb.EphemeralClient() chroma_collection = chroma_client.get_or_create_collection( "multimodal_collection", embedding_function=embedding_function, data_loader=image_loader, ) load documents documents = SimpleDirectoryReader("./guide").load_data() set up ChromaVectorStore and load in data vector_store = ChromaVectorStore(chroma_collection=chroma_collection) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, embed_model=embedding_large, ) when i do a : retriever = index.as_retriever(similarity_top_k=50) retrieval_results = retriever.retrieve("what is the in memory model")
documents
variable after loading:documents = SimpleDirectoryReader("./guide").load_data() for doc in documents: if hasattr(doc, 'image'): print(f"Image loaded: {doc.metadata['file_path']}")
from PIL import Image image_path = "./guide/foobar.png" image = Image.open(image_path) image_embedding = embedding_function.encode_image(image) print(f"Image embedding shape: {image_embedding.shape}")
for node in index.docstore.docs.values(): if hasattr(node, 'image'): print(f"Image in index: {node.metadata['file_path']}")
retriever = index.as_retriever(similarity_top_k=50) retrieval_results = retriever.retrieve("what is the in memory model") for result in retrieval_results: if hasattr(result.node, 'image'): print(f"Retrieved image: {result.node.metadata['file_path']}") else: print(f"Retrieved text: {result.node.text[:100]}...")
retriever = index.as_retriever(image_similarity_top_k=5) image_retrieval_results = retriever.image_to_image_retrieve("./guide/foobar.png") for result in image_retrieval_results: print(f"Retrieved image: {result.node.metadata['file_path']}")
MultiModalVectorStoreIndex
is correctly configured to handle both text and images.OpenCLIPEmbeddingFunction
doesn't have a direct encode_image
method as I incorrectly suggested. Let's correct this and provide a more accurate way to check your image embeddings using the OpenCLIP embedding function in LlamaIndex.from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
import base64 from PIL import Image from io import BytesIO # Initialize the embedding function embedding_function = OpenCLIPEmbeddingFunction() # Load and prepare the image image_path = "./guide/foobar.png" with open(image_path, "rb") as image_file: image_data = base64.b64encode(image_file.read()).decode() # Create a list with a single image (OpenCLIP expects a list) images = [image_data] # Get the embedding image_embedding = embedding_function(images) print(f"Image embedding shape: {len(image_embedding[0])}")
If you're still not seeing your images being retrieved, you might need to ensure that your `SimpleDirectoryReader` is configured to load images. You can do this by specifying the file extensions to include:
SimpleDirectoryReader
is trying to call a load_data
method on a string object, which doesn't exist. This typically happens when the file reader is not properly configured to handle image files.SimpleDirectoryReader
configuration to properly handle image files:from llama_index.core import SimpleDirectoryReader from llama_index.readers.file import ImageReader # Create an instance of ImageReader image_reader = ImageReader() # Configure SimpleDirectoryReader with the ImageReader for .png files documents = SimpleDirectoryReader( input_dir="./guide", file_extractor={ ".png": image_reader } ).load_data()
SimpleDirectoryReader
to use the ImageReader
specifically for .png
files. The ImageReader
is designed to handle image files and will properly load them into the index.file_extractor
dictionary:file_extractor={ ".png": image_reader, ".jpg": image_reader, ".jpeg": image_reader }
SimpleDirectoryReader
should be able to load your image files without the error you're encountering.SimpleDirectoryReader
and handling different file types, you can refer to the LlamaIndex documentation on SimpleDirectoryReader.recursive
parameter in your SimpleDirectoryReader
can lead to images not being indexed properly, especially if they are located in subdirectories. Let's address this based on the information from our knowledge sources.SimpleDirectoryReader
configuration to include the recursive
parameter. Here's how you can do it:from llama_index.core import SimpleDirectoryReader from llama_index.readers.file import ImageReader # Create an instance of ImageReader image_reader = ImageReader() # Configure SimpleDirectoryReader with the ImageReader for image files and set recursive to True documents = SimpleDirectoryReader( input_dir="./guide", recursive=True, file_extractor={ ".png": image_reader, ".jpg": image_reader, ".jpeg": image_reader } ).load_data()
recursive=True
, the SimpleDirectoryReader
will search through all subdirectories within the specified input_dir
, ensuring that all image files are found and indexed properly.SimpleDirectoryReader
. According to the SimpleDirectoryReader Usage guide, you can enable recursive searching in subdirectories by setting recursive=True
:SimpleDirectoryReader(input_dir="path/to/directory", recursive=True)
OpenAIMultiModal
class. Here's how you can modify your code to use it:from llama_index.multi_modal_llms.openai import OpenAIMultiModal from llama_index.core import SimpleDirectoryReader, StorageContext from llama_index.core.indices import MultiModalVectorStoreIndex
openai_mm_llm = OpenAIMultiModal( model="gpt-4-vision-preview", max_new_tokens=300, )
documents = SimpleDirectoryReader("./guide", recursive=True).load_data()
index = MultiModalVectorStoreIndex.from_documents( documents, openai_mm_llm=openai_mm_llm, )
retriever = index.as_retriever(similarity_top_k=5) retrieval_results = retriever.retrieve("your query here")
AzureOpenAIMultiModal
class. Here's how you can modify your code:from llama_index.multi_modal_llms.azure_openai import AzureOpenAIMultiModal
azure_openai_llm = AzureOpenAIMultiModal( engine="my-gpt-4o-mini", # Replace with your deployed model name model="gpt-4o-mini", api_key="your_azure_api_key", azure_endpoint="<https://YOUR_URL.openai.azure.com/",> api_version="2024-02-15-preview", max_new_tokens=300, )
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./guide", recursive=True).load_data()
from llama_index.core.indices import MultiModalVectorStoreIndex index = MultiModalVectorStoreIndex.from_documents( documents, openai_mm_llm=azure_openai_llm, )
retriever = index.as_retriever(similarity_top_k=5) retrieval_results = retriever.retrieve("your query here")
"your_azure_api_key"
, "YOUR_URL"
, and "my-gpt-4o-mini"
with your actual Azure OpenAI API key, endpoint URL, and deployed model name respectively.from llama_index.multi_modal_llms.azure_openai import AzureOpenAIMultiModal from llama_index.core.llms import ( ChatMessage, ImageBlock, TextBlock, MessageRole, ) azure_openai_llm = AzureOpenAIMultiModal( engine="my-gpt-4o-mini", # Replace with your deployed model name model="gpt-4o-mini", max_new_tokens=300, ) msg = ChatMessage( role=MessageRole.USER, blocks=[ TextBlock(text="Describe the images as an alternative text"), ImageBlock(image=image_document.image_resource.data), ], ) response = azure_openai_llm.chat(messages=[msg])
from llama_index.core.schema import Document, ImageDocument # Create a new document with the image description image_description = response.content # Assuming the response content is the description new_document = Document( text=image_description, metadata={ "image_path": image_document.metadata.get("file_path", ""), "original_image_data": image_document.image_resource.data } )
``
This approach allows you to:
1. Use Azure OpenAI's multimodal model to generate a description of the image.
2. Create a new document that contains both the image description and metadata about the original image.
3. Store this document in a vector store index, which can be used for retrieval later.
Remember to replace
"my-gpt-4o-mini"` with your actual deployed model name in Azure OpenAI.SimpleDirectoryReader
and custom document processing. Here's a step-by-step approach to achieve this:SimpleDirectoryReader
to load both markdown and image files:from llama_index.core import SimpleDirectoryReader from llama_index.readers.file import ImageReader image_reader = ImageReader() documents = SimpleDirectoryReader( input_dir="./your_directory", recursive=True, file_extractor={ ".md": "markdown", ".png": image_reader, ".jpg": image_reader, ".jpeg": image_reader } ).load_data()
3. Now you can use these processed documents to create your index:
``
This approach allows you to:
1. Load both markdown and image files using
SimpleDirectoryReader`.text
field, and the associated image documents will be stored in the image_documents
field of the metadata. This allows you to access both the markdown content and the related images when retrieving documents from the index.from llama_index.core import SimpleDirectoryReader from llama_index.core.schema import Document, ImageDocument from llama_index.readers.file import ImageReader image_reader = ImageReader() documents = SimpleDirectoryReader( input_dir="./your_directory", recursive=True, file_extractor={ ".md": "markdown", ".png": image_reader, ".jpg": image_reader, ".jpeg": image_reader, ".gif": image_reader, }, ).load_data() def associate_markdown_with_images(documents): markdown_docs = {} image_docs = {} for doc in documents: if isinstance(doc, ImageDocument): image_docs[doc.metadata["file_path"]] = doc else: markdown_docs[doc.metadata["file_path"]] = doc for md_path, md_doc in markdown_docs.items(): # Regular expression to find image references in markdown image_refs = re.findall(r"![.?]((.?))", md_doc.text) associated_images = [] for img_ref in image_refs: img_path = os.path.join(os.path.dirname(md_path), img_ref) if img_path in image_docs: associated_images.append(image_docs[img_path]) new_doc = Document( text=md_doc.text, metadata={ **md_doc.metadata, "associated_images": [img.metadata["file_path"] for img in associated_images], }, ) new_doc.metadata["image_documents"] = associated_images yield new_doc processed_documents = list(associate_markdown_with_images(documents))
associate_markdown_with_images
function to include image descriptions:This modified version of the function does the following: 1. It initializes the Azure OpenAI multimodal model. 2. For each image referenced in a markdown file, it generates a description using the Azure OpenAI model. 3. It stores both the image document and its description in the `associated_images` list. 4. The new document's metadata includes both the file paths and descriptions of associated images. Now, when you create your index with these processed documents, each document will contain: - The original markdown text - References to associated image files - Descriptions of those images generated by the Azure OpenAI model You can use this enriched data for more comprehensive indexing and retrieval. For example, you could search not just the markdown text, but also the image descriptions. To use this in your vector store index:
"your-deployed-model-name"
with your actual Azure OpenAI deployed model name, and ensure you have the necessary permissions and API keys set up for Azure OpenAI.