get_nodes_from_documents()
method of a NodeParser. Here's a general example:node_parser = SimpleNodeParser.from_defaults(chunk_size=chunk_size) base_nodes = node_parser.get_nodes_from_documents(docs)
SimpleNodeParser
is a type of NodeParser and docs
is a list of Document objects. The chunk_size
parameter determines the size of the chunks that the document will be split into.SentenceSplitter
:node_parser = SentenceSplitter(chunk_size=1024) base_nodes = node_parser.get_nodes_from_documents(docs)
docs
with your actual documents. If your documents are stored in a directory, you can load them using SimpleDirectoryReader
:from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./data/paul_graham").load_data()
documents
to get_nodes_from_documents()
.get_nodes_from_documents()
method from the SentenceSplitter
class in the llama_index.core.node_parser
module. Here's an example:from llama_index.core.node_parser import SentenceSplitter nodes = SentenceSplitter().get_nodes_from_documents(documents)
documents
is your input data. The get_nodes_from_documents()
method will parse these documents into nodes.documents
with your actual documents. If your documents are stored in a directory, you can load them using SimpleDirectoryReader
:from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./data/paul_graham").load_data()
documents
to get_nodes_from_documents()
.IndexNode
in LlamaIndex, you can include a metadata
dictionary when creating the IndexNode
. Here's an example:from llama_index.core.schema import IndexNode index_node = IndexNode( text="The text of the node", index_id="node_id", metadata={ "key1": "value1", "key2": "value2", # Add more key-value pairs as needed } )
text
is the text of the node, index_id
is the unique identifier of the node, and metadata
is a dictionary containing the metadata. You can include any key-value pairs you need in the metadata
dictionary.from llama_index.core.schema import TextNode from llama_index.core.vector_stores import DuckDBVectorStore, StorageContext from llama_index.core import VectorStoreIndex nodes = [ TextNode( text="The Shawshank Redemption", metadata={ "author": "Stephen King", "theme": "Friendship", "year": 1994, "ref_doc_id": "doc_1", }, ), # Add more nodes as needed ] vector_store = DuckDBVectorStore() storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex(nodes, storage_context=storage_context)
TextNode
in the nodes
list has a metadata
dictionary.