Find answers from the community

Updated 2 months ago

Hello I seem to have encountered a error

Hello, I seem to have encountered a error regarding the base GPT Index class. I am generating custom nodes through a for loop. As you can see I am including the doc_id for each node and have checked that all are filled. Yet, I get this error: ValueError: Reference doc id is None. It refers me to this file /site-packages/llama_index/indices/base.py and highlight the following line:

if index_struct is None:
...
--> 108 raise ValueError("Reference doc id is None.")
109 result_tups.append(
110 NodeEmbeddingResult(id, id_to_node_map[id], embed, doc_id=doc_id)

Code
nodes = []
#transcript_array refers to an array of phrases that Whisper outputs.
for index,phrase in enumerate(transcript_array):
#current obj index
node = Node(text=phrase['content'] + " " + str(phrase['start']), doc_id=index)
if index > 0 and index < len(transcript_array) - 1:
node.relationships[DocumentRelationship.PREVIOUS] = index - 1
node.relationships[DocumentRelationship.NEXT] = index + 1
elif index == 0:
node.relationships[DocumentRelationship.NEXT] = index + 1
elif index == len(transcript_array) - 1:
node.relationships[DocumentRelationship.PREVIOUS] = index - 1
nodes.append(node)
index = GPTSimpleVectorIndex(nodes)

Could it be from my custom nodes? I have attached a txt file of how they look like when i print(nodes)I am following the tutorial from here so some help would be really appreciated.https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html
j
B
5 comments
hey @BioHacker , thanks for surfacing. this could be classified as a bug, but i can explain the reason. the reason for this is that if you want to insert into a vector index, the source relationship needs to be defined. e.g. node.relationships[DocumentRelationship.SOURCE] = "the source doc id"
cc @disiok - this is probably a check we can remove?
Thanks for responding @jerryjliu0. What do you exactly mean by source doc id? Is it the id associated with the document the nodes come from or the id of the node itself? We should add this to the tutorial here https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html
Thanks in advance!
yeah it's a good point
Add a reply
Sign up and join the conversation on Discord