Find answers from the community

Updated 2 months ago

Mapping Unstructured Text to Knowledge Graphs: Exploring the State of the Art

At a glance

The post asks about the state of the art for mapping unstructured text to knowledge graphs (KGs) and what community members have found most useful or powerful. The comments discuss the challenges of using LLMs to extract graphs, such as high computational cost, storage complexities, and issues with entity linking and deduplication. One community member prefers graph-like approaches using metadata tagging instead of involving graphs explicitly. They also differentiate between KGs and property graphs, noting that property graphs offer more customization and better code quality. Overall, the community members express skepticism about the current state of mapping unstructured text to KGs, considering it to be in a proof-of-concept stage.

Useful resources
@Logan M what's the state of the art for mapping unstructured text to KGs? What have you found most useful/powerful?
L
J
5 comments
I haven't found it useful at all tbh. Huge computation/token cost, storage complexities, etc. I don't think its worth the effort right now.

What I have found useful is graph-like approaches. Something like using metadata tagging so that if you retrieve some node from a section of text, you use the metadata to help fetch the full section. Things that retrieve by reference can be achieved without needing to involve graphs explicitly
Huh, interesting. So your appreciation for this https://docs.llamaindex.ai/en/stable/examples/index_structs/knowledge_graph/KnowledgeGraphDemo/ might not be that great? Also are you differentiating b/t a KG and a property graph (https://docs.llamaindex.ai/en/stable/module_guides/indexing/lpg_index_guide/) ?

Not doubting you, just trying to ensure I understand clealry. Thanks.
Property graph index is basically a knowledge graph index, but updated with more customization (and in my opinion, better code quality)

My opinion is that extracting graphs with LLMs is pretty brittle. It costs a ton of tokens, time, and is hard to scale as your dataset grows.

There's also very valid (and unsolved) issue about entity linking and deduplicating. As well as how to best leverage the graph for retrieval
This is just my take based on my experience (and I actually wrote the property graph index lol)

It's a cool tech demo, but i think it's very much in a PoC stage right now
Awesome, thanks for the feedback.

I just tried GraphRAG directly from MSFT. It looks ... reasonable. It created the entities from the content and related them but I don't see edge labels at all. Time to go do more research. Thanks, @Logan M !
Add a reply
Sign up and join the conversation on Discord