Understanding Empty Entity List in Graph RAG Example

mmiha5754

Hi, I'm not sure I should be asking this here, but here it goes 🙂

I've been trying to follow this graph-RAG example https://docs.llamaindex.ai/en/stable/examples/cookbooks/GraphRAG_v2/, and I cannot understand why the list of entities I get is empty.

I can see that I get 4 TextNodes/NodeWithScore matches for the first example query. But since the text in them are just the original plain text (no extra formatting), the regex that looks for entities and relationships doesn't get any matches (this one: pattern = r"(\w+(?:\s+\w+))\s({[^}]})\s->\s([^(]+?)\s({[^}]})\s->\s(\w+(?:\s+\w+))").

My question is: should the nodes_retrieved be something else than TextNodes/NodeWithScore that just have the original text? If so, how can one control this?

For any reference, I've set up a local instance of Neo4j where I indexed everything, as in the example.
I attached how one of the nodes retrieved looks like.

Attachment

9 comments

LLogan M

maybe @ravitheja could help

mmiha5754

I've been struggling with this for a few weeks now, not understanding whether I was doing something wrong and I found it particularly difficult to troubleshoot as it feels (to me at least) as there are many layers of abstraction (from the "raw" data that is stored in the graph DB)

Any help is appreciated.

I am also looking for alternate implementations, as I really want to have a working POC to establish a baseline.

mmiha5754

This issue has also been reported on git 2 weeks ago: https://github.com/run-llama/llama_index/issues/15173#issuecomment-2363187225

rravitheja

@miha5754 is the issue coming because of graph database or llm?

can you follow GraphRAG _v1 implementation and let us know if that works?

The prompts, regex that we provided in the guide works best with gpt-4 and I observed you are using gpt-4o-mini

rravitheja

@miha5754 the pattern seems not working now as the text generated is different. Please replace with the following pattern in get_entities and see if it works. Otherwise, please look into the text in the nodes retrieved and adjust the pattern accordingly. You need to basically debug GraphRAGQueryEngine

pattern = r"(\w+(?:\s+\w+))\s->\s([^-]+?)\s->\s(\w+(?:\s+\w+))"

mmiha5754

Hi, thanks for answering. I arrived at the same conclusion yesterday after going through the whole thing again.

Now I just wanted to write my conclusion, which is the same as yours 🙂

rravitheja

Cool. Glad you figured it out. 👍

mmiha5754

I actually found that there was a small issue with this and that not all entities were properly extracted.

The fix:

Plain Text

        entities = set()
        for node in nodes_retrieved:
            lines = node.text.split('\n')
            pattern = r"^(.*?)\s->\s(.*?)\s->\s(.*?)$"

            for line in lines:
                match = re.match(pattern, line.strip())
                if match:
                    subject, obj = match.group(1), match.group(3)
                    entities.add(subject)
                    entities.add(obj)

rravitheja

cool. It entirely depends on how the text is extracted from the LLM. Seems like its extracting in a different way.

Add a reply

Find answers from the community

Understanding Empty Entity List in Graph RAG Example