Find answers from the community

Updated 6 months ago

Anyone have experience loading RDF data

At a glance

The community member is having issues loading RDF data into the Llama index. They have tried using the GPTVectorStoreIndex, which works quickly but doesn't give the best results, and the KnowledgeGraphIndex, which takes a long time and a lot of resources, and the resulting knowledge graph seems to only include some of the entities.

Another community member suggests using the RDF Loader, which is the recommended way to load a pre-made knowledge graph into the Llama index, while the KnowledgeGraphIndex is the recommended way to create a knowledge graph using unstructured data.

The community members discuss a notebook that combines a knowledge graph and vector index, but the original poster clarifies that they are trying to use structured data (RDF) to create a knowledge graph and then use that as context information for the Llama index, and they believe the correct approach is to use the RDFReader and ignore the KGIndex functionality.

Another community member agrees that this may be the best approach, and the original poster shares a blog post they wrote about their experience. A final community member suggests that the poor performance of the initial index approach may be due to the default CSV reader not being great, and they encourage the original poster to contribute any ideas for improving the RDFReader.

Useful resources
Anyone have experience loading RDF data into Llama index? I can use GPTVectorStoreIndex to index the raw RDF (or ttl or JSON-LD) file and that works quickly but doesn't give the best results. The KnowledgeGraphIndex takes super long and takes a lot of resources and I don't know that the results are that much better. Also, when I visualize the knowledge graph that KnowledgeGraphIndex creates it seems like it only took some of the entities in the KG as nodes in the LLM. Any help/guidance would be greatly appreciated.
l
L
8 comments
I see that there an RDF Loader (https://github.com/emptycrown/llama-hub/tree/main/llama_hub/file/rdf)

Am I right in assuming that this is the recommended way to load a pre-made knowledge graph into Llama index, and that the KnowledgeGraphIndex is the recommended way of CREATING a KG with llama using unstructured data?
Yea that's the correct way of thinking about it I think

While you can initialize the KG index with pre-made triplets, the performance is likely.. just alright haha

There is an interesting notebook here that combines a KG and vector index
https://github.com/jerryjliu/llama_index/blob/main/docs/examples/index_structs/knowledge_graph/KnowledgeGraphIndex_vs_VectorStoreIndex_vs_CustomIndex_combined.ipynb
Thank you so much @Logan M!

It looks like that notebook is focused on building a KG using unstructured data, is that correct? I am trying to use structured data to create a KG (using RDFLib/python, not Llama) and then use that RDF data as context info for Llama. I believe the correct approach is then to just use the RDFReader and ignore the KGIndex functionality in Llama entirely, is that correct?
It may be the best approach then, yes!
Interesting post!

I think one reason for the poor performance on the initial index approach is that our default csv reader is not great. I'm pretty sure it just splits each row into a document/node πŸ˜…
That makes sense! Are there plans to improve the RDFReader so that it works better with RDF data?
Not at the moment, but if you have ideas for better loading it would be a great PR to llama hub! πŸ’ͺπŸ™
Add a reply
Sign up and join the conversation on Discord