The community member is seeking resources to learn about GraphRAG and how to deploy it in production. They are specifically interested in using Neo4j as the database and want to understand how to add and remove documents without having to recreate the entire community structure. The comments indicate that the community member runs an edtech platform and is looking to scale their existing RAPTOR-based solution to the entire course, but is facing challenges with Raptor's scalability. They believe GraphRAG could be a solution, but are concerned about the expense of recreating communities with each new file upload. The community members discuss a potential solution involving a PR in Microsoft GraphRAG and are open to other ideas to make GraphRAG production-ready for dynamic file updates.
Hello what are the best resources to learn about GraphRAG and put it into production? Also is Neo4j the best DB to deploy a graph rag solution? If so how does adding and removing documents work exactly? I can’t seem to find much info on document/chunk insertion and deletion.
Yes! So I run an edtech platform and we currently have SOTA RAG for individual files using an improved RAPTOR implementation. We want to scale this approach to the entire course as opposed to just one file. However Raptor seems to not scale well beyond a few files. This is where GraphRAG comes in handy. The issue with graphRAG though is that you have to recreate communities with every new file uploaded. This is extremely expensive of course. So I want to figure out a way where we can do entity detection once and cheaply cluster the new files with the old communities.
@biswaroop I know there is currently a PR open in Microsoft GraphRAG for this exact case. But would love to see if llamaindex has some ideas too. This is critical in bringing GraphRAG to production when files are dynamic.