Find answers from the community

Updated 5 months ago

how to convert Document to base nodes?

@kapa.ai how to convert Document to base nodes?
k
i
10 comments
Please stand by @ispan3tz as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
To convert a Document into base nodes in LlamaIndex, you can use the get_nodes_from_documents() method of a NodeParser. Here's a general example:

Plain Text
node_parser = SimpleNodeParser.from_defaults(chunk_size=chunk_size)
base_nodes = node_parser.get_nodes_from_documents(docs)


In this example, SimpleNodeParser is a type of NodeParser and docs is a list of Document objects. The chunk_size parameter determines the size of the chunks that the document will be split into.

You can also use other types of NodeParsers, such as SentenceSplitter:

Plain Text
node_parser = SentenceSplitter(chunk_size=1024)
base_nodes = node_parser.get_nodes_from_documents(docs)


In this case, the document is split into sentences instead of fixed-size chunks.

Remember to replace docs with your actual documents. If your documents are stored in a directory, you can load them using SimpleDirectoryReader:

Plain Text
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham").load_data()


Then you can pass documents to get_nodes_from_documents().

Sources:
  • Param_optimizer ()
  • Recurisve_retriever_nodes_braintrust ()
  • BM25 Retriever ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai how to parse Document to NodeType?
Let me check the knowledge base for you. One moment! :books:
----------------------
To parse a Document into NodeType in LlamaIndex, you can use the get_nodes_from_documents() method from the SentenceSplitter class in the llama_index.core.node_parser module. Here's an example:

Plain Text
from llama_index.core.node_parser import SentenceSplitter

nodes = SentenceSplitter().get_nodes_from_documents(documents)


In this example, documents is your input data. The get_nodes_from_documents() method will parse these documents into nodes.

Please note that you need to replace documents with your actual documents. If your documents are stored in a directory, you can load them using SimpleDirectoryReader:

Plain Text
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham").load_data()


Then you can pass documents to get_nodes_from_documents().
Sources:
  • Docstore Demo ()
  • Firestore Demo ()
  • Redis Docstore+Index Store Demo ()
  • Dynamo DB Docstore Demo ()
  • Firestore Demo on GitHub ()
  • MongoDB Demo ()
  • Azure Table Storage as a Docstore Demo ()
  • UpTrain Callback Handler on GitHub ()
  • Dynamo DB Docstore Demo on GitHub ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai how to define metadata for IndexNode?
Let me check the knowledge base for you. One moment! :books:
----------------------
To define metadata for an IndexNode in LlamaIndex, you can include a metadata dictionary when creating the IndexNode. Here's an example:

Plain Text
from llama_index.core.schema import IndexNode

index_node = IndexNode(
    text="The text of the node",
    index_id="node_id",
    metadata={
        "key1": "value1",
        "key2": "value2",
        # Add more key-value pairs as needed
    }
)


In this example, text is the text of the node, index_id is the unique identifier of the node, and metadata is a dictionary containing the metadata. You can include any key-value pairs you need in the metadata dictionary.

Here's another example from the LlamaIndex documentation, where metadata is defined for each node in a list of nodes:

Plain Text
from llama_index.core.schema import TextNode
from llama_index.core.vector_stores import DuckDBVectorStore, StorageContext
from llama_index.core import VectorStoreIndex

nodes = [
    TextNode(
        text="The Shawshank Redemption",
        metadata={
            "author": "Stephen King",
            "theme": "Friendship",
            "year": 1994,
            "ref_doc_id": "doc_1",
        },
    ),
    # Add more nodes as needed
]

vector_store = DuckDBVectorStore()
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes, storage_context=storage_context)


In this example, each TextNode in the nodes list has a metadata dictionary.
Sources:
  • Multi_doc_auto_retrieval ()
  • Metadata filter example ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord