llama_parse/examples/multimodal/multimod...

At a glance

The community member has successfully run a notebook from the llama_parse repository and wants to store the images on Azure Blob Storage instead of saving them locally. The community members discuss various approaches to achieve this, including using a custom retriever, the BaseImageRetriever, and directly accessing the Azure Blob Storage. They provide code snippets and suggest referring to the llama_parse documentation for guidance. However, there is no explicitly marked answer, and the community members continue to troubleshoot the issue.

Useful resources

ggalvangjx

hello there, I have successfully ran through this notebook -> https://github.com/run-llama/llama_parse/blob/main/examples/multimodal/multimodal_rag_slide_deck.ipynb
Instead of saving the images locally, can I store them on Azure blob storage and during querying time does the images get read from Azure blob storage? I just want to keep the implementation the same instead of saving the images locally, I move them to a cloud storage.

18 comments

LLogan M

I think the download images function in the llama-parse client would have to allow you to pass an fs object that points to azure?

Otherwise, I think you'd have to use the raw api to download the file bytes and then push that to azure
https://docs.cloud.llamaindex.ai/category/API/parsing

ggalvangjx

my PDFs are already in bytes before parsing with LlamaParse so I'm already pushing them, as images, to azure blob storage concurrently. What I would like to know is, if the TextNodes contains blob file paths as metadata, how can I get the custom query engine to access them?

ggalvangjx

These are the agent response:

Plain Text

Added user message to memory: Extract all line items from each page and them all in a valid JSON schema. For
each page verify if all line items in the page before returning an answer.
=== Calling Function ===
Calling function: count_tool with args: {"input": "page 1"}
=== Function Output ===
Encountered error: [Errno 22] Invalid argument: 'https://<resource_name>.blob.core.windows.net/<container_name>/<folder_1>/<folder_2>/page_2.jpg'
=== Calling Function ===
Calling function: table_tool with args: {"input": "page 1"}
=== Function Output ===
Encountered error: [Errno 22] Invalid argument: 'https://<resource_name>.blob.core.windows.net/<container_name>/<folder_1>/<folder_2>/page_2.jpg''
=== Calling Function ===
Calling function: table_tool with args: {"input": "page 2"}
=== Function Output ===
Encountered error: [Errno 22] Invalid argument: 'https://<resource_name>.blob.core.windows.net/<container_name>/<folder_1>/<folder_2>/page_2.jpg'
=== Calling Function ===
Calling function: table_tool with args: {"input": "page 3"}
=== Function Output ===
Encountered error: [Errno 22] Invalid argument: 'https://<resource_name>.blob.core.windows.net/<container_name>/<folder_1>/<folder_2>/page_2.jpg'
=== LLM Response ===
It seems there was an error while trying to extract the line items from the pages. The error indicates an invalid argument related to accessing the document pages. Please check the document source or provide a valid input for further processing.

ggalvangjx

Do I need an additional tool that give access to blob storage or I have to include a reader within my custom multimodal query engine?

LLogan M

Yea so the textnodes have it in the metadata. I think you'd need some custom step after retrieval to use those paths to fetch the images properly

ggalvangjx

ok so i need a custom retrieval which contains VectorStoreIndex and the custom step you mentioned. How can I write that?

LLogan M

Not custom retrieval (or it could be a I guess), but some step after running normal retrieval

LLogan M

Maybe you can wrap the existing retriever into a custom retriever, like this
https://docs.llamaindex.ai/en/stable/examples/query_engine/CustomRetrievers/

ggalvangjx

on high level, its gonna look something like this?

Plain Text

index = VectorStoreIndex(text_nodes, embed_model=embed_model)
vector_retriever = VectorIndexRetriever(index=index)
# image_retriever = ...
custom_retriever = CustomRetriever(vector_retriever, ImageRetriever)
query_engine = MultimodalQueryEngine(
    retriever=custom_retriever.as_retriever(similarity_top_k=3),
    multi_modal_llm=gpt_4o
)

ggalvangjx

sorry how do i do this?

ggalvangjx

i see in the documentation there is BaseImageRetriever

LLogan M

Oh hmmm, I guess if using the multimodal query engine, you'll need to use BaseImageRetriever 🤔

Tbh you could just skip the query engine too (its not doing much). Most of our latest multi-modal examples just use the llm directly.

see here
https://github.com/run-llama/llama_parse/blob/main/examples/multimodal/multimodal_rag_slide_deck.ipynb

LLogan M

That example will probably be helpful

ggalvangjx

that example uses a query engine too 😵‍💫

ggalvangjx

i also plan to extend the code with agents/tools so i might need a query engine?

ggalvangjx

I got this from RunLLM on your documentation, gonna try that out.

Plain Text

from azure.storage.blob import BlobServiceClient

class MultimodalQueryEngine(CustomQueryEngine):
    # ... existing code ...

    def custom_query(self, query_str: str) -> Response:
        # retrieve text nodes
        nodes = self.retriever.retrieve(query_str)
        
        # Initialize Azure Blob Service Client
        blob_service_client = BlobServiceClient.from_connection_string("your_connection_string")

        # create ImageNode items from text nodes
        image_nodes = []
        for n in nodes:
            if 'image_path' in n.metadata:
                try:
                    # Download image from Azure Blob Storage
                    blob_client = blob_service_client.get_blob_client(container="your_container", blob=n.metadata['image_path'])
                    with open(n.metadata['image_path'], "wb") as download_file:
                        download_file.write(blob_client.download_blob().readall())
                    
                    image_nodes.append(
                        NodeWithScore(
                            node=ImageNode(image_path=n.metadata['image_path']),
                        )
                    )
                except Exception as e:
                    print(
                        f'Warning: Failed to create ImageNode from {n.metadata["image_path"]}: {str(e)}'
                    )
                    continue
                
...

ggalvangjx

failed haha

ggalvangjx

@Logan M finally got it to work. is there a way to check if the images are really being used?

Add a reply

Find answers from the community

llama_parse/examples/multimodal/multimod...