Find answers from the community

Updated last year

is this correct? ```from llama_index.node_parser.extractors import ( MetadataExtractor

At a glance
@kapa.ai is this correct?
Plain Text
from llama_index.node_parser.extractors import (
    MetadataExtractor,
    QuestionsAnsweredExtractor,
    TitleExtractor
)
from llama_index.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser

text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=20)

metadata_extractor = MetadataExtractor(
    extractors=[
        TitleExtractor(nodes=5),
        QuestionsAnsweredExtractor(questions=3),
    ],
)

node_parser = SimpleNodeParser(
    text_splitter = text_splitter,
    metadata_extractor=metadata_extractor
)
k
a
6 comments
Please stand by @andysingal as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
The code you provided seems to be a mix of different parts from the extracts. However, it's not entirely correct according to the information provided in the extracts. Here's a corrected version based on the extracts:

Plain Text
from llama_index.node_parser import TokenTextSplitter
from llama_index.extractors import (
    TitleExtractor,
    QuestionsAnsweredExtractor,
)
from llama_index.ingestion import IngestionPipeline

text_splitter = TokenTextSplitter(
    separator=" ", chunk_size=512, chunk_overlap=128
)
title_extractor = TitleExtractor(nodes=5)
qa_extractor = QuestionsAnsweredExtractor(questions=3)

pipeline = IngestionPipeline(
    transformations=[text_splitter, title_extractor, qa_extractor]
)

nodes = pipeline.run(
    documents=documents,
    in_place=True,
    show_progress=True,
)

This code first imports the necessary modules. Then it creates a TokenTextSplitter instance, TitleExtractor instance, and QuestionsAnsweredExtractor instance. These are then used as transformations in an IngestionPipeline which is run on some documents to extract nodes.

Please note that the documents variable is not defined in this code snippet. You would need to define it with your own data before running the pipeline.

For more details, you can refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai how to further use nodes?
Plain Text
andrew_gina_nodes = node_parser.get_nodes_from_documents(andrew_gina_docs)
?
Let me check the knowledge base for you. One moment! :books:
----------------------
Once you have obtained the nodes from the documents using the get_nodes_from_documents method, you can use these nodes for various purposes depending on your needs. Here are a few examples based on the information provided in the extracts:

  1. Inspecting the Metadata of Nodes: You can inspect the metadata of each node. Here's an example of how to print the metadata of the first 5 nodes:
Plain Text
from pprint import pprint

for i in range(5):
    pprint(andrew_gina_nodes[i].metadata)

This code will print the metadata of the first 5 nodes in a pretty format. The metadata contains information about the node such as its attributes and properties.

  1. Using Nodes in an Ingestion Pipeline: Nodes can be used as input to an ingestion pipeline. Here's an example:
Plain Text
from llama_index.ingestion import IngestionPipeline
from llama_index.node_parser import TokenTextSplitter

pipeline = IngestionPipeline(transformations=[TokenTextSplitter(), ...])

nodes = pipeline.run(documents=andrew_gina_nodes)

In this example, the nodes are used as input to an IngestionPipeline which applies a series of transformations to the nodes.

  1. Storing Nodes in a Document Store: If you're using a document store, you can store the nodes in the document store for later retrieval and use. Here's an example of how to parse documents into nodes and store them in a document store:
Plain Text
from llama_index.node_parser import SentenceSplitter

nodes = SentenceSplitter().get_nodes_from_documents(andrew_gina_docs)
docstore.add_documents(nodes)

In this example, the nodes are added to a document store using the add_documents method.
For more details on how to use nodes, you can refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord