----------------------
To extract tables and text from a markdown file and load it into a document, you can use the
MarkdownElementNodeParser
class from LlamaIndex. Here's an example of how you can do it:
First, you need to initialize the
MarkdownElementNodeParser
:
from llama_index.core.node_parser import MarkdownElementNodeParser
md_node_parser = MarkdownElementNodeParser()
Then, you can use the
get_nodes_from_documents
method to extract elements from the markdown documents:
md_nodes = md_node_parser.get_nodes_from_documents(md_documents)
In the above code,
md_documents
is a list of
Document
objects that you want to parse. You can load markdown files into
Document
objects using the
MarkdownReader
class from LlamaIndex:
from llama_index.readers.file import MarkdownReader
from pathlib import Path
reader = MarkdownReader()
md_documents = reader.load_data(Path("your_markdown_file.md"))
In the above code, replace
"your_markdown_file.md"
with the path to your markdown file.
The
get_nodes_from_documents
method will return a list of
Node
objects. Each
Node
object represents a section of the markdown document, and it contains a list of
Element
objects. Each
Element
object represents a piece of content in the section, such as a paragraph of text or a table.
You can find more details in the
LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------