Find answers from the community

Updated 2 months ago

Understanding Empty Entity List in Graph RAG Example

Hi, I'm not sure I should be asking this here, but here it goes πŸ™‚

I've been trying to follow this graph-RAG example https://docs.llamaindex.ai/en/stable/examples/cookbooks/GraphRAG_v2/, and I cannot understand why the list of entities I get is empty.

I can see that I get 4 TextNodes/NodeWithScore matches for the first example query. But since the text in them are just the original plain text (no extra formatting), the regex that looks for entities and relationships doesn't get any matches (this one: pattern = r"(\w+(?:\s+\w+))\s({[^}]})\s->\s([^(]+?)\s({[^}]})\s->\s(\w+(?:\s+\w+))").

My question is: should the nodes_retrieved be something else than TextNodes/NodeWithScore that just have the original text? If so, how can one control this?

For any reference, I've set up a local instance of Neo4j where I indexed everything, as in the example.
I attached how one of the nodes retrieved looks like.
Attachment
image.png
L
m
r
9 comments
maybe @ravitheja could help
I've been struggling with this for a few weeks now, not understanding whether I was doing something wrong and I found it particularly difficult to troubleshoot as it feels (to me at least) as there are many layers of abstraction (from the "raw" data that is stored in the graph DB)

Any help is appreciated.

I am also looking for alternate implementations, as I really want to have a working POC to establish a baseline.
@miha5754 is the issue coming because of graph database or llm?

can you follow GraphRAG_v1 implementation and let us know if that works?

The prompts, regex that we provided in the guide works best with gpt-4 and I observed you are using gpt-4o-mini
@miha5754 the pattern seems not working now as the text generated is different. Please replace with the following pattern in get_entities and see if it works. Otherwise, please look into the text in the nodes retrieved and adjust the pattern accordingly. You need to basically debug GraphRAGQueryEngine

pattern = r"(\w+(?:\s+\w+))\s->\s([^-]+?)\s->\s(\w+(?:\s+\w+))"
Hi, thanks for answering. I arrived at the same conclusion yesterday after going through the whole thing again.

Now I just wanted to write my conclusion, which is the same as yours πŸ™‚
Cool. Glad you figured it out. πŸ‘
I actually found that there was a small issue with this and that not all entities were properly extracted.

The fix:

Plain Text
        entities = set()
        for node in nodes_retrieved:
            lines = node.text.split('\n')
            pattern = r"^(.*?)\s->\s(.*?)\s->\s(.*?)$"

            for line in lines:
                match = re.match(pattern, line.strip())
                if match:
                    subject, obj = match.group(1), match.group(3)
                    entities.add(subject)
                    entities.add(obj)
`
cool. It entirely depends on how the text is extracted from the LLM. Seems like its extracting in a different way.
Add a reply
Sign up and join the conversation on Discord