Find answers from the community

Updated 3 weeks ago

llama_parse/examples/multimodal/multimod...

hello there, I have successfully ran through this notebook -> https://github.com/run-llama/llama_parse/blob/main/examples/multimodal/multimodal_rag_slide_deck.ipynb
Instead of saving the images locally, can I store them on Azure blob storage and during querying time does the images get read from Azure blob storage? I just want to keep the implementation the same instead of saving the images locally, I move them to a cloud storage.
L
g
18 comments
I think the download images function in the llama-parse client would have to allow you to pass an fs object that points to azure?

Otherwise, I think you'd have to use the raw api to download the file bytes and then push that to azure
https://docs.cloud.llamaindex.ai/category/API/parsing
my PDFs are already in bytes before parsing with LlamaParse so I'm already pushing them, as images, to azure blob storage concurrently. What I would like to know is, if the TextNodes contains blob file paths as metadata, how can I get the custom query engine to access them?
These are the agent response:
Plain Text
Added user message to memory: Extract all line items from each page and them all in a valid JSON schema. For
each page verify if all line items in the page before returning an answer.
=== Calling Function ===
Calling function: count_tool with args: {"input": "page 1"}
=== Function Output ===
Encountered error: [Errno 22] Invalid argument: 'https://<resource_name>.blob.core.windows.net/<container_name>/<folder_1>/<folder_2>/page_2.jpg'
=== Calling Function ===
Calling function: table_tool with args: {"input": "page 1"}
=== Function Output ===
Encountered error: [Errno 22] Invalid argument: 'https://<resource_name>.blob.core.windows.net/<container_name>/<folder_1>/<folder_2>/page_2.jpg''
=== Calling Function ===
Calling function: table_tool with args: {"input": "page 2"}
=== Function Output ===
Encountered error: [Errno 22] Invalid argument: 'https://<resource_name>.blob.core.windows.net/<container_name>/<folder_1>/<folder_2>/page_2.jpg'
=== Calling Function ===
Calling function: table_tool with args: {"input": "page 3"}
=== Function Output ===
Encountered error: [Errno 22] Invalid argument: 'https://<resource_name>.blob.core.windows.net/<container_name>/<folder_1>/<folder_2>/page_2.jpg'
=== LLM Response ===
It seems there was an error while trying to extract the line items from the pages. The error indicates an invalid argument related to accessing the document pages. Please check the document source or provide a valid input for further processing.
Do I need an additional tool that give access to blob storage or I have to include a reader within my custom multimodal query engine?
Yea so the textnodes have it in the metadata. I think you'd need some custom step after retrieval to use those paths to fetch the images properly
ok so i need a custom retrieval which contains VectorStoreIndex and the custom step you mentioned. How can I write that?
Not custom retrieval (or it could be a I guess), but some step after running normal retrieval
Maybe you can wrap the existing retriever into a custom retriever, like this
https://docs.llamaindex.ai/en/stable/examples/query_engine/CustomRetrievers/
on high level, its gonna look something like this?

Plain Text
index = VectorStoreIndex(text_nodes, embed_model=embed_model)
vector_retriever = VectorIndexRetriever(index=index)
# image_retriever = ...
custom_retriever = CustomRetriever(vector_retriever, ImageRetriever)
query_engine = MultimodalQueryEngine(
    retriever=custom_retriever.as_retriever(similarity_top_k=3),
    multi_modal_llm=gpt_4o
)
sorry how do i do this?
i see in the documentation there is BaseImageRetriever
Oh hmmm, I guess if using the multimodal query engine, you'll need to use BaseImageRetriever πŸ€”

Tbh you could just skip the query engine too (its not doing much). Most of our latest multi-modal examples just use the llm directly.

see here
https://github.com/run-llama/llama_parse/blob/main/examples/multimodal/multimodal_rag_slide_deck.ipynb
That example will probably be helpful
that example uses a query engine too πŸ˜΅β€πŸ’«
i also plan to extend the code with agents/tools so i might need a query engine?
I got this from RunLLM on your documentation, gonna try that out.

Plain Text
from azure.storage.blob import BlobServiceClient

class MultimodalQueryEngine(CustomQueryEngine):
    # ... existing code ...

    def custom_query(self, query_str: str) -> Response:
        # retrieve text nodes
        nodes = self.retriever.retrieve(query_str)
        
        # Initialize Azure Blob Service Client
        blob_service_client = BlobServiceClient.from_connection_string("your_connection_string")

        # create ImageNode items from text nodes
        image_nodes = []
        for n in nodes:
            if 'image_path' in n.metadata:
                try:
                    # Download image from Azure Blob Storage
                    blob_client = blob_service_client.get_blob_client(container="your_container", blob=n.metadata['image_path'])
                    with open(n.metadata['image_path'], "wb") as download_file:
                        download_file.write(blob_client.download_blob().readall())
                    
                    image_nodes.append(
                        NodeWithScore(
                            node=ImageNode(image_path=n.metadata['image_path']),
                        )
                    )
                except Exception as e:
                    print(
                        f'Warning: Failed to create ImageNode from {n.metadata["image_path"]}: {str(e)}'
                    )
                    continue
                
...
@Logan M finally got it to work. is there a way to check if the images are really being used?
Add a reply
Sign up and join the conversation on Discord