Hello,

Hello,
I am trying to implement GraphRAG using my own knowledge graph in Neo4j. I have used Neo4jPropertyGraphStore and then the PropertyGraphIndex object with the from_existing() function to work with my own KG.

I noticed that I need to set all my nodes as __Entity__ and __Node__ for it to work correctly. However, I'm encountering the following error:

Plain Text

ValidationError: 1 validation error for EntityNode
name
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/string_type

I checked the provided URL, but I couldn't find any useful information to resolve this issue. Could you please provide guidance or help me understand what might be causing this error?

Thank you!

20 comments

the name from_existing might be misleading -- it means an existing graph created by llama-index 😅 It requires entities and relations to have some certain structure

Wdym? So if I have a neo4j graph store I still need to use from_documents? Can't I use from_existing and then manually insert the documents using index.insert?

@Arthur if the graph was created with the proeprty graph index already, go ahead and use from_exsiting as is 👍

You technically can use from_existing and then insert() as well, but querying will only query the stuff you inserted I think (unless you are using some of the cypher retrievers)

Hmmm I see

Can I choose a community to insert my nodes? Separate communities like one for a kind of documents, other one for another kind etc

What I found weird about this nomenclature is that from_documents seems to create a temporary index, and after it was created I cannot add more docs.

from_existing seems that I will connect to a persistent index and then, using the insert method, I will insert the docs into the index

from_documents does not create a temporary index? You can still call insert() after from_documents or from_existing

The intention is that from_documents is creating a new/fresh index (or at least, thats the intended usage)

from_existing is for loading a graph you created

this is the exact same as VectorStoreIndex.from_documents() vs. VectorStoreIndex.from_vector_store() if you've used that class before

I see. So if I made an endpoint to users insert data in my existing graph store by letting them pass the entities, relationships and validation schema in the request, I can use either from_documents or from_existing? The user will submit the file too.

Yea I think so?

I think this is confusing lol

Doesn't seem to be persistent by the naming conventions

I think its fine? I'm not really sure where the confusion is

from_documents() is for creating a new index

from_existing() is for loading/connecting to something existing

Seems straightforward?

Most examples in the documentation for Langchain and LlamaIndex tend to demonstrate implementations using methods like from_documents, jumping straight into the retrieval phase. This gives the impression that ingestion and retrieval occur together throughout the application's lifecycle, which is often not the case in production environments. Typically, there are distinct phases for ingestion and retrieval. The from_documents method seems to assume that the location of the documents is already known (e.g., in a folder) and, based on its name, doesn’t appear to support adding more documents after the initial ingestion. Additionally, if I were to set up a separate ingestion endpoint and use from_documents to load documents into a property graph, it seems like this would create a new property graph index each time I consume the endpoint.

I think that is not persistent if you don't call the function storage_context.persist()

So, is there no way to converse through my custom graph using llamaindex?

Based on what you told me, I tried to simulate my graph using PropertyGraphIndex with SchemaLLMPathExtractor to create a knowledge graph from certain documents.
The graph has been created perfectly and seems somewhat meaningful, however, when performing a retrieval, it returns an empty list.

index_schema = PropertyGraphIndex.from_documents(
documents,
kg_extractors=[kg_extractor],
property_graph_store=pg_store,
vec_store=vec_store,
show_progress=True
)

retriever = index_schema.as_retriever().retrieve(query)