llama_parse/examples/multimodal/multimod...

At a glance

The community member has questions about preparing a knowledge base for a multimodal RAG (Retrieval Augmented Generation) application. They are using the MarkdownElementNodeParser from the LlamaParse library, which separates text and tables into IndexNode and BaseNode. The community member wants to add page number and image path metadata to these nodes, but the sequence of the nodes is already jumbled up.

In the comments, another community member suggests that the initial input documents should already have the page number and image path metadata attached, so that the nodes inherit it. The community member then provides their current setup and parsing steps, where they are downloading a PDF file from Azure blob storage and passing it to the parsing service. They note that the get_json_result() method does not return a Document object, but rather a bunch of information that can be used to construct a Document object.

The community members discuss whether the load_data() method should be used instead, but note that some metadata may be lost. The recommendation is to use the JSON result and construct Document objects from it, so that any desired information can be included.

There is no explicitly marked answer in the comments.

Useful resources

ggalvangjx

hello there, i have some questions about preparing your knowledge base for a multimodal rag application.

referencing from this guide - https://github.com/run-llama/llama_parse/blob/main/examples/multimodal/multimodal_rag_slide_deck.ipynb
it iterates through each page and creates a TextNode, and at the same time, adds the page number and image path of the image as metadata.

For my case, I am using MarkdownElementNodeParser which separates texts and tables into IndexNode and BaseNode. Similarly I will like to add page number and image path into these nodes' metadata. But the sequence of the nodes are already jumbled up from line 2 onwards. So how can I still add the page number and image path in them? Thanks

Plain Text

[1] node_parser = MarkdownElementNodeParser(llm=llm)
[2] nodes = node_parser.get_nodes_from_documents([document])
[3] base_nodes, objects = node_parser.get_nodes_and_objects(nodes)

6 comments

LLogan M

your initial input documents should probably be per-page already, and have the metadata attached, so that the nodes inherit it

ggalvangjx

So before running line 2 of the code, document (the Document object from LlamaParse) should already have page number and image path as metadata?

My document exist in Azure blob storage and I am passing in my file as file bytes to the parsing service via .get_json_result()

For context this is my set up and parsing steps

Plain Text

# define LlamaParse parameters
parser_params = {
    'api_key': LLAMA_CLOUD_API_KEY,
    'result_type': 'markdown',
}

parser = LlamaParse(**parser_params)
extra_info = {"file_name": blob_name}

# code to download blob as bytes


# write document stream to temp file
with tempfile.NamedTemporaryFile(suffix='.pdf', delete=False) as temp_file:
    document_stream.seek(0)
    temp_file.write(document_stream.read())
    temp_file_path = temp_file.name

# parse document
document = parser.get_json_result(temp_file.name, extra_info=extra_info)

# clean up temp file
os.remove(temp_file_path)

LLogan M

get_json_result doesn't return Document object, but rather, a bunch of info that you can use to construct a Document

ggalvangjx

I should use load_data then?

LLogan M

maybe? I think you lose out on some metadata though. I would use the json result and construct document objects from it, that way you can get any info you want

LLogan M

This is a thourough example of what the json result is returning
https://github.com/run-llama/llama_parse/blob/main/examples/demo_json_tour.ipynb

From there, you could construct document objects Document(text=text, metadata={'key': 'val', ...}

Add a reply

Find answers from the community

llama_parse/examples/multimodal/multimod...