Find answers from the community

Updated 3 months ago

Nodes

@Logan M I think you might be the best person to point me to where I can look in the code base to fix this. I implemented the Kùzu Property Graph Index, and there's one thing that's been bugging me regarding the output of the .retrieve method of a graph retriever object. I'm trying to display all Llama nodes obtained as a result for a given retriever query string.
Plain Text
nodes = kg_index.as_retriever(include_text=False).retrieve("Marie Curie")
for node in nodes:
    print(node.text)

Here's what I'd expect to get (based on the outputs for other graph DBs):
Plain Text
Marie Curie -> DISCOVERED -> radium

But what I actually get is below. Note how the information is buried in there - it's just that a lot of other JSON fields are converted to a string and concatenated with the (src)-[rel]->(target) string output.
Plain Text
Marie Curie ({'id': 'Marie Curie', 'text': None, 'label': 'PERSON', 'embedding': None, 'creation_date': datetime.date(2024, 9, 13), 'last_modified_date': datetime.date(2024, 9, 13), 'file_name': 'curie.txt', 'file_path': '/code/data/curie/curie.txt', 'file_size': 1830, 'file_type': 'text/plain', 'ref_doc_id': None, 'triplet_source_id': '4014e554-016d-4f90-b743-f093c7677fa5'}) -> DISCOVERED -> radium ({'id': 'Marie Curie', 'text': None, 'label': 'PERSON', 'embedding': None, 'creation_date': datetime.date(2024, 9, 13), 'last_modified_date': datetime.date(2024, 9, 13), 'file_name': 'curie.txt', 'file_path': '/code/data/curie/curie.txt', 'file_size': 1830, 'file_type': 'text/plain', 'ref_doc_id': None, 'triplet_source_id': '4014e554-016d-4f90-b743-f093c7677fa5'})

Is there somewhere I should be implementing a custom __str__ or __repr__ method so that my retriever outputs the string correctly? I know I'm missing something.
L
p
24 comments
I don't know yet what the correct way to handle metadata is just yet
But PRs are welcome
Will take a look, shoudn't be hard to fix
I have a feeling that this is highly specific to Kùzu, because none of the other graph DBs seem to have anything in their implementations that customize the string representation of an entity node
Let me see what Kuzu is outputting, and it might be a custom fix in my case
hmm I don't think its specific to kuzu actually, I think its just due to the base EntityNode class forcing metadata into the string representation
Why do all the example notebooks on property graph index show the right formatting then? Has something changed in more recent versions of LlamaIndex?
Let me give this a try using Neo4j and see if the same thing happens
probably this was updated after those notebooks got ran
@Logan M where in the code does the -> get added? The level of abstraction is such (or maybe it's just me) but I can't seem to diagnose how/where the string Interleaf -> Was -> Company gets the -> from
The retriever.retrieve results from Neo4j seem to be outputting the chunk texts for nodes of type NodeWithScore, which seems to be a subclass of ChunkNode. So I can see why node.text outputs just the chunk's text. But I'm not able to find any part of the code in types.py that displays node.text with the edge patterns displayed as per the outputs shown in all those example notebooks
Check out the second link I shared earlier, it links to the exact line
The second link leads to the file itself, but not a specific line :/
It loads very slow -- the line number is in the link if it's not scrolling to it, line 60
The first link says line 67, and that doesn't have the ->
Line 60 doesn't seem to have anything related to the str representation
Oh my bad! Sorry about that
Got lost in URL soup 🤦🏽‍♂️
Found it, thanks
Ok I submitted a PR after running some tests, works well on my end after running the example notebooks for multiple DBs - not sure why the tests are failing in CI tho:
https://github.com/run-llama/llama_index/pull/16100
Add a reply
Sign up and join the conversation on Discord