Nodes

pprashanth

@Logan M I think you might be the best person to point me to where I can look in the code base to fix this. I implemented the Kùzu Property Graph Index, and there's one thing that's been bugging me regarding the output of the .retrieve method of a graph retriever object. I'm trying to display all Llama nodes obtained as a result for a given retriever query string.

Plain Text

nodes = kg_index.as_retriever(include_text=False).retrieve("Marie Curie")
for node in nodes:
    print(node.text)

Here's what I'd expect to get (based on the outputs for other graph DBs):

Plain Text

Marie Curie -> DISCOVERED -> radium

But what I actually get is below. Note how the information is buried in there - it's just that a lot of other JSON fields are converted to a string and concatenated with the (src)-[rel]->(target) string output.

Plain Text

Marie Curie ({'id': 'Marie Curie', 'text': None, 'label': 'PERSON', 'embedding': None, 'creation_date': datetime.date(2024, 9, 13), 'last_modified_date': datetime.date(2024, 9, 13), 'file_name': 'curie.txt', 'file_path': '/code/data/curie/curie.txt', 'file_size': 1830, 'file_type': 'text/plain', 'ref_doc_id': None, 'triplet_source_id': '4014e554-016d-4f90-b743-f093c7677fa5'}) -> DISCOVERED -> radium ({'id': 'Marie Curie', 'text': None, 'label': 'PERSON', 'embedding': None, 'creation_date': datetime.date(2024, 9, 13), 'last_modified_date': datetime.date(2024, 9, 13), 'file_name': 'curie.txt', 'file_path': '/code/data/curie/curie.txt', 'file_size': 1830, 'file_type': 'text/plain', 'ref_doc_id': None, 'triplet_source_id': '4014e554-016d-4f90-b743-f093c7677fa5'})

Is there somewhere I should be implementing a custom __str__ or __repr__ method so that my retriever outputs the string correctly? I know I'm missing something.

24 comments

LLogan M

Yea it's coming from here
https://github.com/run-llama/llama_index/blob/75684082599504e8ae3c31ab580b418a663372b3/llama-index-core/llama_index/core/graph_stores/types.py#L67

Which is then used here
https://github.com/run-llama/llama_index/blob/75684082599504e8ae3c31ab580b418a663372b3/llama-index-core/llama_index/core/indices/property_graph/sub_retrievers/base.py#L60

LLogan M

I don't know yet what the correct way to handle metadata is just yet

LLogan M

But PRs are welcome

pprashanth

Thank you!

pprashanth

Will take a look, shoudn't be hard to fix

pprashanth

I have a feeling that this is highly specific to Kùzu, because none of the other graph DBs seem to have anything in their implementations that customize the string representation of an entity node

pprashanth

Let me see what Kuzu is outputting, and it might be a custom fix in my case

LLogan M

hmm I don't think its specific to kuzu actually, I think its just due to the base EntityNode class forcing metadata into the string representation

pprashanth

Oh ok!

pprashanth

Why do all the example notebooks on property graph index show the right formatting then? Has something changed in more recent versions of LlamaIndex?

pprashanth

Let me give this a try using Neo4j and see if the same thing happens

LLogan M

probably this was updated after those notebooks got ran

pprashanth

@Logan M where in the code does the -> get added? The level of abstraction is such (or maybe it's just me) but I can't seem to diagnose how/where the string Interleaf -> Was -> Company gets the -> from

pprashanth

The retriever.retrieve results from Neo4j seem to be outputting the chunk texts for nodes of type NodeWithScore, which seems to be a subclass of ChunkNode. So I can see why node.text outputs just the chunk's text. But I'm not able to find any part of the code in types.py that displays node.text with the edge patterns displayed as per the outputs shown in all those example notebooks

LLogan M

Check out the second link I shared earlier, it links to the exact line

pprashanth

The second link leads to the file itself, but not a specific line :/

LLogan M

It loads very slow -- the line number is in the link if it's not scrolling to it, line 60

pprashanth

The first link says line 67, and that doesn't have the ->

pprashanth

Line 60 doesn't seem to have anything related to the str representation

pprashanth

Oh my bad! Sorry about that

pprashanth

Got lost in URL soup 🤦🏽‍♂️

pprashanth

Found it, thanks

LLogan M

👍👍

pprashanth

Ok I submitted a PR after running some tests, works well on my end after running the example notebooks for multiple DBs - not sure why the tests are failing in CI tho:
https://github.com/run-llama/llama_index/pull/16100

Add a reply

Find answers from the community

Nodes