Anyone have experience loading RDF data

llemoncheddar

Anyone have experience loading RDF data into Llama index? I can use GPTVectorStoreIndex to index the raw RDF (or ttl or JSON-LD) file and that works quickly but doesn't give the best results. The KnowledgeGraphIndex takes super long and takes a lot of resources and I don't know that the results are that much better. Also, when I visualize the knowledge graph that KnowledgeGraphIndex creates it seems like it only took some of the entities in the KG as nodes in the LLM. Any help/guidance would be greatly appreciated.

8 comments

llemoncheddar

I see that there an RDF Loader (https://github.com/emptycrown/llama-hub/tree/main/llama_hub/file/rdf)

Am I right in assuming that this is the recommended way to load a pre-made knowledge graph into Llama index, and that the KnowledgeGraphIndex is the recommended way of CREATING a KG with llama using unstructured data?

LLogan M

Yea that's the correct way of thinking about it I think

While you can initialize the KG index with pre-made triplets, the performance is likely.. just alright haha

There is an interesting notebook here that combines a KG and vector index
https://github.com/jerryjliu/llama_index/blob/main/docs/examples/index_structs/knowledge_graph/KnowledgeGraphIndex_vs_VectorStoreIndex_vs_CustomIndex_combined.ipynb

llemoncheddar

Thank you so much @Logan M!

It looks like that notebook is focused on building a KG using unstructured data, is that correct? I am trying to use structured data to create a KG (using RDFLib/python, not Llama) and then use that RDF data as context info for Llama. I believe the correct approach is then to just use the RDFReader and ignore the KGIndex functionality in Llama entirely, is that correct?

LLogan M

It may be the best approach then, yes!

llemoncheddar

@Logan M - if you're interested, I wrote a blog post documenting what I did here: https://stevehedden.medium.com/harnessing-the-power-of-knowledge-graphs-enriching-an-llm-with-structured-data-997fabc62386?sk=552a8f07ad3a14a55c3b944c9bc484d2

LLogan M

Interesting post!

I think one reason for the poor performance on the initial index approach is that our default csv reader is not great. I'm pretty sure it just splits each row into a document/node 😅

llemoncheddar

That makes sense! Are there plans to improve the RDFReader so that it works better with RDF data?

LLogan M

Not at the moment, but if you have ideas for better loading it would be a great PR to llama hub! 💪🙏

Add a reply

Find answers from the community

Anyone have experience loading RDF data