Find answers from the community

Updated 2 years ago

When using GPTKnowledgeGraphIndex with

At a glance

The community member is experiencing an issue with the GPTKnowledgeGraphIndex and MongoIndexStore, where the embeddings are causing the Mongo document to be too large. They are experimenting with a knowledge graph and have a large amount of text data, and need to store triplets and their embeddings. The community members suggest that there are plans to make the storage more scalable, and one community member mentions a specific pull request that may address this issue. Another community member suggests using S3 or Google Buckets instead of Mongo to persist the index, as an alternative solution.

Useful resources
When using GPTKnowledgeGraphIndex with MongoIndexStore and include_embeddings=True, I am running into an error with Mongo:

pymongo.errors.DocumentTooLarge: 'update' command document too large

I guess the reason for that is all the embeddings are stored in one mongo doc. my UC is that I am experimenting with KG and have a couple of hundreds of pages of text and need to store triplets and their embeddings somehow.

are there any plans to make the storage more scalable, or is there a better way to achieve this and I am going at it from a totally wrong angle? thank you!
L
L
3 comments
Definitely plans to make this more scalable! Step one is finally merging this PR at some point lol

https://github.com/jerryjliu/llama_index/pull/2581
For now, maybe you want to look into use S3 or google buckets to persist the index, instead of mongodb

https://gpt-index.readthedocs.io/en/latest/how_to/storage/save_load.html#using-a-remote-backend
that PR looks exactly like what I need πŸ™‚ really hope that you get to merging at soon πŸ˜„ thank you for the tip, will give that a try
Add a reply
Sign up and join the conversation on Discord