I have raw_nodes_2021 = node_parser.get_nodes_from

At a glance

The community member has an issue where the node_parser.get_nodes_from_documents(docs_2021) method is using the default OpenAI LLM, but they want to use their Azure OpenAI model instead. The comments provide two solutions:

1. Set the LLM globally using the Settings class in LlamaIndex by defining the Azure OpenAI model and setting Settings.llm = azure_llm.

2. Pass the Azure OpenAI model directly to the UnstructuredElementNodeParser constructor using the llm parameter, without affecting the global settings.

Additionally, the community members discuss how to import the IndexNode class and how to handle a large markdown table in LlamaIndex, including using the MarkdownElementNodeParser to preserve the table structure.

Useful resources

ccmosguy

@kapa.ai I have raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021) but it keeps calling the default openAI llm, I have an azure LLM that needs t be used here. How do I tell this method in the node_parser class that has:

Plain Text

from llama_index.core.node_parser import UnstructuredElementNodeParser

node_parser = UnstructuredElementNodeParser()

if not os.path.exists("2021_nodes.pkl"):
    raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)
    pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb"))
else:
    raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))

to use my llm_4o_mini azure open ai model class?

14 comments

kkapa.ai

----------------------
To use your Azure OpenAI model with the UnstructuredElementNodeParser, you need to set the LLM globally using the Settings class in LlamaIndex. Here's how you can modify your code to use your Azure OpenAI model:

First, define your Azure OpenAI model:

Plain Text

from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.core import Settings

# Define your Azure OpenAI model
azure_llm = AzureOpenAI(
    engine="your-engine-name",  # Replace with your Azure deployment name
    model="gpt-4o-mini",  # Replace with your actual model name
    api_key="your-api-key",
    azure_endpoint="<https://your-resource-name.openai.azure.com/",>
    api_version="2023-07-01-preview"
)

# Set the LLM globally
Settings.llm = azure_llm

Then, use the UnstructuredElementNodeParser as you did before:

Plain Text

from llama_index.core.node_parser import UnstructuredElementNodeParser
import os
import pickle

node_parser = UnstructuredElementNodeParser()

if not os.path.exists("2021_nodes.pkl"):
    raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)
    pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb"))
else:
    raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))

By setting the LLM globally using Settings.llm = azure_llm, all components of LlamaIndex, including the UnstructuredElementNodeParser, will use your specified Azure OpenAI model instead of the default OpenAI model.

Make sure to replace "your-engine-name", "your-api-key", and "your-resource-name" with your actual Azure OpenAI deployment details.

kkapa.ai

This approach is based on the information provided in the LlamaIndex Azure OpenAI documentation and the LlamaIndex Azure OpenAI API reference.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ccmosguy

@kapa.ai is there any way to do thes same thing thing without setting the Settings.llm?

kkapa.ai

Let me check the knowledge base for you again. One moment! :books:

kkapa.ai

----------------------
Yes, there is a way to use your Azure OpenAI model with the UnstructuredElementNodeParser without setting the global Settings.llm. You can pass the LLM directly to the UnstructuredElementNodeParser when initializing it. Here's how you can modify your code to achieve this:

First, define your Azure OpenAI model:

Plain Text

from llama_index.llms.azure_openai import AzureOpenAI

azure_llm = AzureOpenAI(
    engine="your-engine-name",  # Replace with your Azure deployment name
    model="gpt-4o-mini",  # Replace with your actual model name
    api_key="your-api-key",
    azure_endpoint="<https://your-resource-name.openai.azure.com/",>
    api_version="2023-07-01-preview"
)

Then, pass the azure_llm to the UnstructuredElementNodeParser:

Plain Text

from llama_index.core.node_parser import UnstructuredElementNodeParser
import os
import pickle

node_parser = UnstructuredElementNodeParser(llm=azure_llm)

if not os.path.exists("2021_nodes.pkl"):
    raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)
    pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb"))
else:
    raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))

By passing the azure_llm directly to the UnstructuredElementNodeParser constructor using the llm parameter, you're telling this specific instance of the parser to use your Azure OpenAI model without affecting the global settings.

This approach allows you to use your custom LLM for this specific parser while leaving the global settings unchanged. It's based on the information provided in the LlamaIndex documentation on setting local configurations, which states:

"Interfaces that use specific parts of the settings can also accept local overrides."

kkapa.ai

Remember to replace "your-engine-name", "your-api-key", and "your-resource-name" with your actual Azure OpenAI deployment details.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ccmosguy

where do I import IndexNode?

ccmosguy

@kapa.ai where do I import IndexNode?

kkapa.ai

Let me check the knowledge base for you again. One moment! :books:

kkapa.ai

----------------------
To import IndexNode in LlamaIndex, you can use the following import statement:

Plain Text

from llama_index.core.schema import IndexNode

This import statement is based on the information provided in multiple knowledge sources, including:

LlamaIndex API Reference

LlamaIndex Examples

Both sources consistently show that IndexNode is imported from the llama_index.core.schema module. This import will allow you to use the IndexNode class in your LlamaIndex project.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ccmosguy

@kapa.ai let's say I have a large markdown table, what is the best way to store this in the docstore and vector db? Should I just load it in as a text document or should I put it through some other processing?

kkapa.ai

Let me check the knowledge base for you again. One moment! :books:

kkapa.ai

----------------------
To handle a large markdown table in LlamaIndex, you have a few options for processing and storing it effectively in the docstore and vector database. Based on the provided knowledge sources, here's a recommended approach:

Use the MarkdownElementNodeParser:

The MarkdownElementNodeParser is specifically designed to handle markdown content, including tables. It can parse the markdown table into a more structured format.

Plain Text

   from llama_index.core.node_parser import MarkdownElementNodeParser
   
   node_parser = MarkdownElementNodeParser()

Process the table:

When you use the MarkdownElementNodeParser, it will automatically handle the table parsing. For perfect tables (those with consistent column counts), it will convert them to a dataframe. For non-perfect tables, it will store them as raw text with a different type to differentiate them from perfect tables.

Plain Text

   # Assuming your markdown content is in a variable called 'markdown_content'
   nodes = node_parser.get_nodes_from_documents([markdown_content])

Create an index:

After parsing, you can create a VectorStoreIndex to store the processed nodes.

Plain Text

   from llama_index.core import VectorStoreIndex
   
   index = VectorStoreIndex(nodes)

This approach is based on the information from the LlamaIndex MarkdownElementNodeParser documentation, which details how the parser handles tables and other markdown elements.

kkapa.ai

By using this method, you're ensuring that:

The table structure is preserved when possible (for perfect tables).
The content is still searchable even if the table structure isn't perfect.
You're leveraging LlamaIndex's built-in capabilities for handling markdown content.

This approach should provide a good balance between maintaining the structure of your markdown table and making it easily searchable in your vector database. The docstore will contain the structured representation of the table, while the vector database will store the embeddings for efficient retrieval.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Add a reply

Find answers from the community

I have raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021) but it keeps calli