from llama_index.core.node_parser import UnstructuredElementNodeParser node_parser = UnstructuredElementNodeParser() if not os.path.exists("2021_nodes.pkl"): raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021) pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb")) else: raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))
UnstructuredElementNodeParser
, you need to set the LLM globally using the Settings
class in LlamaIndex. Here's how you can modify your code to use your Azure OpenAI model:from llama_index.llms.azure_openai import AzureOpenAI from llama_index.core import Settings # Define your Azure OpenAI model azure_llm = AzureOpenAI( engine="your-engine-name", # Replace with your Azure deployment name model="gpt-4o-mini", # Replace with your actual model name api_key="your-api-key", azure_endpoint="<https://your-resource-name.openai.azure.com/",> api_version="2023-07-01-preview" ) # Set the LLM globally Settings.llm = azure_llm
UnstructuredElementNodeParser
as you did before:from llama_index.core.node_parser import UnstructuredElementNodeParser import os import pickle node_parser = UnstructuredElementNodeParser() if not os.path.exists("2021_nodes.pkl"): raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021) pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb")) else: raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))
Settings.llm = azure_llm
, all components of LlamaIndex, including the UnstructuredElementNodeParser
, will use your specified Azure OpenAI model instead of the default OpenAI model."your-engine-name"
, "your-api-key"
, and "your-resource-name"
with your actual Azure OpenAI deployment details.UnstructuredElementNodeParser
without setting the global Settings.llm
. You can pass the LLM directly to the UnstructuredElementNodeParser
when initializing it. Here's how you can modify your code to achieve this:from llama_index.llms.azure_openai import AzureOpenAI azure_llm = AzureOpenAI( engine="your-engine-name", # Replace with your Azure deployment name model="gpt-4o-mini", # Replace with your actual model name api_key="your-api-key", azure_endpoint="<https://your-resource-name.openai.azure.com/",> api_version="2023-07-01-preview" )
azure_llm
to the UnstructuredElementNodeParser
:from llama_index.core.node_parser import UnstructuredElementNodeParser import os import pickle node_parser = UnstructuredElementNodeParser(llm=azure_llm) if not os.path.exists("2021_nodes.pkl"): raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021) pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb")) else: raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))
azure_llm
directly to the UnstructuredElementNodeParser
constructor using the llm
parameter, you're telling this specific instance of the parser to use your Azure OpenAI model without affecting the global settings."your-engine-name"
, "your-api-key"
, and "your-resource-name"
with your actual Azure OpenAI deployment details.IndexNode
in LlamaIndex, you can use the following import statement:from llama_index.core.schema import IndexNode
IndexNode
is imported from the llama_index.core.schema
module. This import will allow you to use the IndexNode
class in your LlamaIndex project.MarkdownElementNodeParser
:MarkdownElementNodeParser
is specifically designed to handle markdown content, including tables. It can parse the markdown table into a more structured format.from llama_index.core.node_parser import MarkdownElementNodeParser node_parser = MarkdownElementNodeParser()
MarkdownElementNodeParser
, it will automatically handle the table parsing. For perfect tables (those with consistent column counts), it will convert them to a dataframe. For non-perfect tables, it will store them as raw text with a different type to differentiate them from perfect tables.# Assuming your markdown content is in a variable called 'markdown_content' nodes = node_parser.get_nodes_from_documents([markdown_content])
VectorStoreIndex
to store the processed nodes.from llama_index.core import VectorStoreIndex index = VectorStoreIndex(nodes)