Find answers from the community

Updated 2 months ago

any best practices for adding/inserting

any best practices for adding/inserting data into a Property Graph?

if i run this twice (with different documents), it does seem that the 2nd set of documents is inserted into my neo4j knowledge graph, and the resulting query-engine can still answer questions about information from first set of documents. so at least with the underlying neo4j KG, it seems like the default is to insert the new entities, but not delete what is there.

Plain Text
index = PropertyGraphIndex.from_documents(
    documents[:NUMBER_OF_ARTICLES],
    kg_extractors=[kg_extractor],
    llm=llm,
    embed_model=embed_model,
    property_graph_store=graph_store,
    show_progress=True,
)


that said, i'm still a bit traumatized from the normal vector database document persist/load caching stuff that is required, and i just want to make sure i know how to add docs to an existing property graph without any risk of deleting what is already there.

and related - how do i properly compute the new relationships, if i'm inserting stuff into an existing property graph?

this would be a super helpful example to have, i'm sure a lot of people aren't going to want to re-build the whole property graph every time they think of a new doc to add....
L
r
6 comments
You can definitinely do index.insert(document) or index.insert_nodes(nodes)

All the graph stuff should be automatically upserts (upserting nodes or relations that are the same), shouldn't be a risk of deleting πŸ€”
ok great, so the first time i build, i can do index = PropertyGraphIndex.from_documents(, but then if i want to add more, i can do index.insert(document)?
just tried that...(using the wikipedia loader).

Plain Text
add = reader.load_data(pages=["Polymer"])
index.insert(add)

and getting this trace:

Plain Text
AttributeError                            Traceback (most recent call last)
Cell In[51], line 2
      1 add = reader.load_data(pages=["Polymer"])
----> 2 index.insert(add)

File ~/Documents/GitHub/llama_index/.venv/lib/python3.11/site-packages/llama_index/core/indices/base.py:236, in BaseIndex.insert(self, document, **insert_kwargs)
    234 """Insert a document."""
    235 with self._callback_manager.as_trace("insert"):
--> 236     nodes = run_transformations(
    237         [document],
    238         self._transformations,
    239         show_progress=self._show_progress,
    240     )
    242     self.insert_nodes(nodes, **insert_kwargs)
    243     self.docstore.set_document_hash(document.get_doc_id(), document.hash)

File ~/Documents/GitHub/llama_index/.venv/lib/python3.11/site-packages/llama_index/core/ingestion/pipeline.py:130, in run_transformations(nodes, transformations, in_place, cache, cache_collection, **kwargs)
    128             cache.put(hash, nodes, collection=cache_collection)
    129     else:
--> 130         nodes = transform(nodes, **kwargs)
    132 return nodes

File ~/Documents/GitHub/llama_index/.venv/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    226 self.span_enter(
...
    127         CBEventType.NODE_PARSING, payload={EventPayload.DOCUMENTS: documents}
    128     ) as event:
    129         nodes = self._parse_nodes(documents, show_progress=show_progress, **kwargs)

AttributeError: 'list' object has no attribute 'id_'
but this works:

Plain Text
add = reader.load_data(pages=["Polymer"])


index = PropertyGraphIndex.from_documents(
    add,
    kg_extractors=[kg_extractor],
    llm=llm,
    embed_model=embed_model,
    property_graph_store=graph_store,
    show_progress=True,
)
just a list error, insert only takes one at a time...
Plain Text
add = reader.load_data(pages=["Polymer"])
index.insert(add[0])
Add a reply
Sign up and join the conversation on Discord