how do I convert a Document into a Node in `llama-index...

At a glance

The community member asked how to convert a Document into a Node in llama-index/gpt_index. The responses provided two ways to do this:

1. Using the NodeParser classes, such as the SimpleNodeParser, to parse Document objects into Node objects.

2. Manually constructing Node objects and setting their relationships, such as NEXT and PREVIOUS.

The community members also discussed how to handle the first and last nodes when setting relationships, and how to restrict a LlamaIndex query to only use information in its own index by using required_keywords and exclude_keywords. Additionally, they discussed how to turn off token usage information when querying a LlamaIndex by creating a custom ServiceContext without an llm_predictor.

Useful resources

rryanglambert

@kapa.ai how do I convert a Document into a Node in llama-index/gpt_index

18 comments

kkapa.ai

Please stand by @ryanglambert as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:

kkapa.ai

----------------------
To convert a Document into a Node in llama-index/gpt_index, you can use the NodeParser classes. Here's an example using the SimpleNodeParser:

Plain Text

from llama_index.node_parser import SimpleNodeParser

parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

In this example, documents is a list of Document objects. The get_nodes_from_documents function will parse these Document objects into Node objects.

Alternatively, you can manually construct Node objects and set their relationships:

Plain Text

from llama_index.data_structs.node_v2 import Node, DocumentRelationship

node1 = Node(text="<text_chunk>", doc_id="<node_id>")
node2 = Node(text="<text_chunk>", doc_id="<node_id>")
# set relationships
node1.relationships[DocumentRelationship.NEXT] = node2.get_doc_id()
node2.relationships[DocumentRelationship.PREVIOUS] = node1.get_doc_id()

Source: LlamaIndex Usage Pattern - Parse the Documents into Nodes (https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html#parse-the-documents-into-nodes)
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

vvisiblenils

@kapa.ai How do I generate the index from a list of nodes?

JJacky 🇨🇦 Zhang

@kapa.ai what if node1 is the first node, so how to assign its PREVIOUS? and what if node2 is the last one, so what is the Next?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
If node1 is the first node, you don't need to assign a PREVIOUS relationship, as there is no previous node. Similarly, if node2 is the last node, you don't need to assign a NEXT relationship, as there is no next node. The relationships are only assigned when they exist. Here's an example:

Plain Text

from llama_index.data_structs.node_v2 import Node, DocumentRelationship

node1 = Node(text="<text_chunk>", doc_id="<node1_id>")
node2 = Node(text="<text_chunk>", doc_id="<node2_id>")

# Set relationships only when they exist
# For node1, since it's the first node, no PREVIOUS relationship is assigned
node1.relationships[DocumentRelationship.NEXT] = node2.get_doc_id()

# For node2, since it's the last node, no NEXT relationship is assigned
node2.relationships[DocumentRelationship.PREVIOUS] = node1.get_doc_id()

In this example, node1 is the first node, so it only has a NEXT relationship pointing to node2. node2 is the last node, so it only has a PREVIOUS relationship pointing to node1.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

JJacky 🇨🇦 Zhang

@kapa.ai what is: ValueError: Reference doc id is None.

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
I couldn't find any reference to the specific error "ValueError: Reference doc id is None" in the provided extracts. However, it seems like the error might be related to a missing or None value for a reference document ID when working with the GPT index.

To provide a more accurate answer, I would need more information or context about the error. In the meantime, you can refer to the GPT index documentation for more details on how to use the library: GPT Index Documentation (https://gpt-index.readthedocs.io/en/latest/)
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

JJacky 🇨🇦 Zhang

@kapa.ai how to restrict llama index query only use information in its own index

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
To restrict a LlamaIndex query to only use information in its own index, you can set required_keywords and exclude_keywords when querying the index. This will preemptively filter out nodes that do not contain required_keywords or contain exclude_keywords, reducing the search space and hence the time/number of LLM calls/cost.

Here's an example:

Plain Text

index.query(
    "What did the author do after Y Combinator?", required_keywords=["Combinator"], 
    exclude_keywords=["Italy"]
)

In this example, the query will only consider nodes that contain the keyword "Combinator" and exclude nodes that contain the keyword "Italy".

Source: LlamaIndex Usage Pattern - Setting required_keywords and exclude_keywords (https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html#setting-required-keywords-and-exclude-keywords)
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

JJacky 🇨🇦 Zhang

@kapa.ai how to turn off token usage information of query answer

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
To turn off token usage information when querying a LlamaIndex, you can create a custom ServiceContext without an llm_predictor. This will prevent the token usage information from being printed.

Here's an example:

Plain Text

from llama_index import ServiceContext

# Create a custom ServiceContext without an llm_predictor
service_context = ServiceContext.from_defaults(llm_predictor=None)

# Query the index using the custom ServiceContext
response = index.query("What did the author do after his time at Y Combinator?", service_context=service_context)

In this example, the ServiceContext is created without an llm_predictor, and the query is performed using this custom ServiceContext. This will prevent token usage information from being printed during the query.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

JJacky 🇨🇦 Zhang

@kapa.ai what if I am using storage_context = StorageContext.from_defaults(persist_dir="./noc_index1")

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

@Jacky 🇨🇦 Zhang That's a great question, but unfortunately I am only capable of responding to a maximum of three questions per thread. Don't hesitate to create a new thread for any further questions you have though. I'm here to help!

Add a reply

Find answers from the community

how do I convert a Document into a Node in `llama-index`/`gpt_index`