Graph rag

At a glance

The post inquires about any insights being incorporated into the PropertyGraphIndex now that the GraphRAG source is available. The community members discuss the GraphRAG codebase, noting that it is "extremely rough" and used 180k tokens to answer one question, suggesting it may not be efficient. They also mention that GraphRAG lacks some interesting features like entity resolution and does not provide much detail on its ranking approach. One community member had to implement their own app-specific entity resolution. Another community member points out that GraphRAG's relevance mechanism involves listing the top 5 most relevant record IDs and adding "+more" to indicate there are more. Overall, the community members seem to view GraphRAG as not yet useful or production-ready.

Useful resources

ggavindoughtie

Any insights making their way into PropertyGraphIndex now that the GraphRAG source is available ? https://microsoft.github.io/graphrag/

9 comments

LLogan M

I tried looking at their codebase. But it's extremely rough.

It also used 180k tokens to answer one question 😅 so i don't think it's efficient either

LLogan M

My take is it's not useful right now. And far from production ready

LLogan M

They also didn't include some of the more interesting stuff like entity resolution

ggavindoughtie

I didn't see a lot of detail on their approach to ranking either; what does Llama Index do for that?

LLogan M

so far nothing either for entity resolution, which is why I was interested haha

ggavindoughtie

I had to write some very app-specific entity resolution that parsed the triple strings. Agree we could do better

ggavindoughtie

Oh. Looking at https://github.com/microsoft/graphrag/blob/main/graphrag/query/structured_search/local_search/system_prompt.py

I see: "Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more." in their system prompt. So... THAT's the relevance mechanism.

LLogan M

😅

jjimmy6dof

I think that high token count applies to the index creating step first time you digest a RAG doc set --- after it is set up and you get these like 10 parquet files then they are kind of diy how you want to use the graph as metadata so token count is up to you and how you set up the yaml config or edit prompts to fit your data. Entity (raw and cleaned) , Nodes , + relationshps each have their own parquet output index. I think this is super useful especially if possible to shift to a local model at some point. Hierarchy like Graphrag or Raptor or .. is imprtant metadata for hybrid search (graph can also be part of ranking etc) -- so please do keep this possibility open for LlamaIndex.

Add a reply

Find answers from the community

Graph rag