any best practices for adding/inserting

At a glance

The community members are discussing best practices for adding or inserting data into a Property Graph. The original post mentions that when running the code twice with different documents, the second set of documents is inserted into the Neo4j knowledge graph, and the query engine can still answer questions about information from the first set of documents. The community member is concerned about the risk of deleting existing data when adding new documents and wants to ensure they know how to add documents to an existing property graph without any risk of deleting what is already there. The comments suggest that the community member can use index.insert(document) or index.insert_nodes(nodes) to add new data, and the graph should automatically handle upserting (updating or inserting) nodes or relations that are the same, without deleting existing data. However, the community member encountered an error when trying to use index.insert(add), where add was a list of documents. The solution provided is to insert the documents one at a time, using index.insert(add[0]). There is no explicitly marked answer in the post or comments.

rrawwerks

any best practices for adding/inserting data into a Property Graph?

if i run this twice (with different documents), it does seem that the 2nd set of documents is inserted into my neo4j knowledge graph, and the resulting query-engine can still answer questions about information from first set of documents. so at least with the underlying neo4j KG, it seems like the default is to insert the new entities, but not delete what is there.

Plain Text

index = PropertyGraphIndex.from_documents(
    documents[:NUMBER_OF_ARTICLES],
    kg_extractors=[kg_extractor],
    llm=llm,
    embed_model=embed_model,
    property_graph_store=graph_store,
    show_progress=True,
)

that said, i'm still a bit traumatized from the normal vector database document persist/load caching stuff that is required, and i just want to make sure i know how to add docs to an existing property graph without any risk of deleting what is already there.

and related - how do i properly compute the new relationships, if i'm inserting stuff into an existing property graph?

this would be a super helpful example to have, i'm sure a lot of people aren't going to want to re-build the whole property graph every time they think of a new doc to add....

6 comments

LLogan M

You can definitinely do index.insert(document) or index.insert_nodes(nodes)

All the graph stuff should be automatically upserts (upserting nodes or relations that are the same), shouldn't be a risk of deleting 🤔

rrawwerks

ok great, so the first time i build, i can do index = PropertyGraphIndex.from_documents(, but then if i want to add more, i can do index.insert(document)?

LLogan M

Yes 🙏

rrawwerks

just tried that...(using the wikipedia loader).

Plain Text

add = reader.load_data(pages=["Polymer"])
index.insert(add)

and getting this trace:

Plain Text

AttributeError                            Traceback (most recent call last)
Cell In[51], line 2
      1 add = reader.load_data(pages=["Polymer"])
----> 2 index.insert(add)

File ~/Documents/GitHub/llama_index/.venv/lib/python3.11/site-packages/llama_index/core/indices/base.py:236, in BaseIndex.insert(self, document, **insert_kwargs)
    234 """Insert a document."""
    235 with self._callback_manager.as_trace("insert"):
--> 236     nodes = run_transformations(
    237         [document],
    238         self._transformations,
    239         show_progress=self._show_progress,
    240     )
    242     self.insert_nodes(nodes, **insert_kwargs)
    243     self.docstore.set_document_hash(document.get_doc_id(), document.hash)

File ~/Documents/GitHub/llama_index/.venv/lib/python3.11/site-packages/llama_index/core/ingestion/pipeline.py:130, in run_transformations(nodes, transformations, in_place, cache, cache_collection, **kwargs)
    128             cache.put(hash, nodes, collection=cache_collection)
    129     else:
--> 130         nodes = transform(nodes, **kwargs)
    132 return nodes

File ~/Documents/GitHub/llama_index/.venv/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    226 self.span_enter(
...
    127         CBEventType.NODE_PARSING, payload={EventPayload.DOCUMENTS: documents}
    128     ) as event:
    129         nodes = self._parse_nodes(documents, show_progress=show_progress, **kwargs)

AttributeError: 'list' object has no attribute 'id_'

rrawwerks

but this works:

Plain Text

add = reader.load_data(pages=["Polymer"])


index = PropertyGraphIndex.from_documents(
    add,
    kg_extractors=[kg_extractor],
    llm=llm,
    embed_model=embed_model,
    property_graph_store=graph_store,
    show_progress=True,
)

rrawwerks

just a list error, insert only takes one at a time...

Plain Text

add = reader.load_data(pages=["Polymer"])
index.insert(add[0])

Add a reply

Find answers from the community

any best practices for adding/inserting