I created a knowledge graph index from

GGero

I created a knowledge graph index from nodes, but it's not working. It says it can't find relationships. When I create them using from_documents(), they work perfectly, but when I do it from nodes, it doesn't find the relationships. The graph seems to be created properly since I can visualize it, and the triplets are reflected in the database. Any ideas?

15 comments

LLogan M

Is it because the query isn't finding any keywords that match your inserted triplets?

GGero

I don't believe that's the case. I've asked specific questions about the triplets from graph_store.json and haven't found any relationships. However, I understand that it might be an issue with the construction process because when I create the index from documents using from_documents(), everything works perfectly. This time, I created it from nodes of vector indices that I already have, and I encounter this problem. This is how I created it:

kg_index = KnowledgeGraphIndex(
nodes,
max_triplets_per_chunk=5,
storage_context=storage_context,
show_progress=True
)

Thank you for your assistance in resolving this matter.

LLogan M

oh, that should work fine I think 😅 What do your nodes look like? Just small text chunks?

GGero

Attachments

GGero

Attachment

LLogan M

case matching I guess

LLogan M

The KG index is kind of 💩 ngl, for quite a few reasons

(And hot take, but you probably don't need it)

cchantlong

kind of curious to the reasons and why you don't need it 😄

LLogan M

I just think scaling an LLM to build a knowledge graph isn't feasible (right now, in terms of speed, costs, controlability)

And the algorithms for LLMs to use KGs kind of suck. Sending triplets to an LLM usually doesn't work well (very simple facts can be expressed in a ton of triplets), although this side is slightly more useful (assuming you already have a KG)

Top-k + hybrid + reranking will usually scale much better

Of course, there will always be cases where KGs work, I just think its very few. Although KGs have a tight/vocal community, so at the same time, supporting them is good thought leadership 😆

cchantlong

And the algorithms for LLMs to use KGs kind of suck.

You mean the general LLMs are terrible at receiving triplets as context and it's way more tokens compared to typical node chunks?

When you say hybrid, are you talking about dense / sparse vector hybrid or dense / sql keyword hybrid?

Right now I'm dealing in medical domain and was thinking KG can help track the evolution of a patient's journey. B/c with vector search it can't doesn't seem to care about time even if it's part of the metadata as least I don't know how to get it to see it in time way. But as you mentioned with the speed and costs seems hard.

LLogan M

Yea, like just feed triplets as context can be confusing. Its less tokens (since triplets are fairly compact), but I think in most cases it confuses the LLM

LLogan M

Yea whean I say hybrid I mean dense + sparse

LLogan M

Patient records/journey is an interesting task 👀 I think it depends on how linear these records are (it feels failry linear, and something that you could summarize? But not sure

cchantlong

Yeah it's an interesting problem. Summarizing is easiest but if its 2-3 months worths of data, it's too many LLM calls. And there's a lot of unimportant notes/details that you don't want it to summarize.

GGero

I have implemented RouterRetriever with approximately 20 tools, then I rerank, and finally, I add the KG retrieve. The RouterRetriever itself gave excellent results, but in certain cases, the KG was able to improve the response, and it also helps me generate graphs for each response, which is visually helpful.

Regarding the initial query I made, I ended up rebuilding the KG with Nebula from documents. I couldn't solve the one I made from nodes, haha.

Add a reply

Find answers from the community

I created a knowledge graph index from