Find answers from the community

Updated 12 months ago

KnowledgeGraphIndex in LLamaindex seems

At a glance

The post indicates that the KnowledgeGraphIndex in LlamaIndex seems to be broken. A community member provides code to generate a knowledge graph, but the resulting graph appears to have issues, with many nodes containing just one character or being empty. The community members discuss potential problems with the graph generation and visualization, and suggest exploring the graph in Nebula or using Neo4j instead.

The community members encounter an IndexError when trying to query the knowledge graph, and they identify a bug in the _extract_rel_text_keywords function in the KGTableRetriever class. After fixing the bug, the query works.

Another community member encounters a different error when trying to create a StorageContext with a Neo4jGraphStore, and the assistant provides a solution using the StorageContext.from_defaults method.

The community members also discuss querying the Neo4j graph store directly using the Neo4jQueryToolSpec, as there doesn't seem to be a direct example of querying the graph store using the KnowledgeGraphRAGRetriever.

bbixqu

KnowledgeGraphIndex in LLamaindex seems to be broken. Here is the complete code to generate a knowledge graph:

Plain Text

from llama_index.core import SummaryIndex, Document

index = SummaryIndex([])
text_chunks = ["cars are red", "cars are big", "elephants are red"]

doc_chunks = []
for i, text in enumerate(text_chunks):
    doc = Document(text=text, id_=f"doc_id_{i}")
    doc_chunks.append(doc)

kg_index = KnowledgeGraphIndex.from_documents(documents=doc_chunks,
                                            storage_context=storage_context,
                                            max_triplets_per_chunk=3,
                                            space_name=space_name,
                                            edge_types=edge_types,
                                            rel_prop_names=rel_prop_names,
                                            tags=tags,
                                            llm=llm,
                                            embed_model=embed_model,
                                            include_embeddings=True)

import networkx as nx
G = kg_index.get_networkx_graph()
nx.draw(G, with_labels=True, font_weight='bold')

And I get this:

34 comments

bbixqu

Attachment

bbixqu

why are there so many nodes with just one character inside, and even an empty one ?

LLogan M

probably its not being generated/parsed properly

If you explored the graph in nebula (I think youa re using nebula?) it would probably make more sense

bbixqu

is there an easy way to plot it directly from nebula?

LLogan M

doesn't nebula have a GUI?

LLogan M

I thought it had some kind of dashboard

bbixqu

yes, no I was asking if there is a function to just plot the graph (from nebula) in the jupyter notebook, but I'll use the dash, thanks

bbixqu

Thing is I am somewhat sure there is something wrong with the graph it generates, as when I try building it with a PDF file (about 1 page of wikipedia article about Europe), I get

Plain Text

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[70], line 2
      1 hybrid_query_engine = kg_index.as_query_engine(llm=llm, embed_model=embed_model)
----> 2 response = hybrid_query_engine.query("Tell me about Europe.")
      3 print(response)

File ~/anaconda3/envs/llamaindex10/lib/python3.11/site-packages/llama_index/core/base/base_query_engine.py:40, in BaseQueryEngine.query(self, str_or_query_bundle)
     38 if isinstance(str_or_query_bundle, str):
     39     str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 40 return self._query(str_or_query_bundle)

File ~/anaconda3/envs/llamaindex10/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py:186, in RetrieverQueryEngine._query(self, query_bundle)
    182 """Answer a query."""
    183 with self.callback_manager.event(
    184     CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str}
    185 ) as query_event:
--> 186     nodes = self.retrieve(query_bundle)
    187     response = self._response_synthesizer.synthesize(
    188         query=query_bundle,
    189         nodes=nodes,
    190     )
    192     query_event.on_end(payload={EventPayload.RESPONSE: response})

File ~/anaconda3/envs/llamaindex10/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py:142, in RetrieverQueryEngine.retrieve(self, query_bundle)
...
--> 177 keyword = rel_text.split(",")[2]
    178 if keyword:
    179     keywords.append(keyword.strip(" ()\"'"))

IndexError: list index out of range

LLogan M

The function that builds the visualzation is probably just broken for nebula. I'm pretty confident the actual graph is fine

This is also some bug with include_text=True I think.

This class needs a lot of TLC

LLogan M

I welcome a PR

bbixqu

hmm...I removed the include_text=True, the error is still there

bbixqu

strange, as I was following an example from a medium article

bbixqu

where it seemed to work fine

LLogan M

I guess something broke over time?

LLogan M

I think include_text=True is the default, might have to set it to false

bbixqu

I'll have to explain that to my boss 😄

bbixqu

yup, tried with False, still the same thing

LLogan M

I really recommend making a PR. Or using neo4j

KGs are lower priority (they are less helpful/scalable IMO, especially in their current state)

Also starting a refactor/addition next week, so these are basically in legacy mode right now

bbixqu

Oh I see, I'll ask about switching to neo4j then. Thanks 🙂

bbixqu

Plain Text

documents = SimpleDirectoryReader("./data").load_data()

graph_store = Neo4jGraphStore(username='neo4j',
                              password='1234',
                              url="neo4j://localhost:8088",
                              database="neo4j")

storage_context = StorageContext.from_defaults(graph_store=graph_store)

index = KnowledgeGraphIndex.from_documents(documents,
    storage_context=storage_context, max_triplets_per_chunk=2,
    include_embeddings=True
)

query_engine = index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=5,
)

response = query_engine.query("Tell me more about Europe")

print(response)

bbixqu

Plain Text

IndexError                                Traceback (most recent call last)
Cell In[9], line 15
      3 index = KnowledgeGraphIndex.from_documents(documents,
      4     storage_context=storage_context, max_triplets_per_chunk=2,
      5     include_embeddings=True
      6 )
      8 query_engine = index.as_query_engine(
      9     include_text=True,
     10     response_mode="tree_summarize",
     11     embedding_mode="hybrid",
     12     similarity_top_k=5,
     13 )
---> 15 response = query_engine.query("Tell me more about Europe")
     17 print(response)

File ~/anaconda3/envs/llamaindex10/lib/python3.11/site-packages/llama_index/core/base/base_query_engine.py:40, in BaseQueryEngine.query(self, str_or_query_bundle)
     38 if isinstance(str_or_query_bundle, str):
     39     str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 40 return self._query(str_or_query_bundle)

File ~/anaconda3/envs/llamaindex10/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py:186, in RetrieverQueryEngine._query(self, query_bundle)
    182 """Answer a query."""
    183 with self.callback_manager.event(
    184     CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str}
...
--> 177 keyword = rel_text.split(",")[2]
    178 if keyword:
    179     keywords.append(keyword.strip(" ()\"'"))

IndexError: list index out of range

bbixqu

@Logan M the neo4j implementation gives the same eror as the nebula graph one above. It seems there is something broken in the BaseQueryEngine or in the RetrieverQueryEngine

LLogan M

The traceback is cut off, but it's some error in the knowledge graph retriever

bbixqu

But is this broken in Llamaindex?

bbixqu

@Logan M I found the error. In the llama_index / core / indices / knowledge_graph/retrievers.py

Plain Text

Class KGTableRetriever(BaseRetriever):
 [...]
  def _extract_rel_text_keywords(self, rel_texts: List[str]) -> List[str]:
        """Find the keywords for given rel text triplets."""
        keywords = []
        for rel_text in rel_texts:
            keyword = rel_text.split(",")[0]
            if keyword:
                keywords.append(keyword.strip("(\"'"))
            # Return the Object as well
            keyword = rel_text.split(",")[1] # was 2            <----------------------------- For some reason this index was set to 2. 
            if keyword:
                keywords.append(keyword.strip(" ()\"'"))
        return keywords

I changed that index to 1 and it now works.

ddkatz123

@kapa.ai When I create a neo4j graph_store and then create a storage_context, I get an error:
from llama_index.graph_stores.neo4j import Neo4jGraphStore

graph_store = Neo4jGraphStore(
username="neo4j",
password="1234",
url="bolt://localhost:1234",
database="neo4j",
node_label="XXX"
)
storage_context = StorageContext(graph_store=graph_store)
TypeError: StorageContext.init() missing 3 required positional arguments: 'docstore', 'index_store', and 'vector_stores'

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
The error you're encountering is because the StorageContext requires three arguments: docstore, index_store, and vector_stores, in addition to graph_store. In the examples provided in the knowledge sources, the StorageContext is created using the from_defaults method, which sets these three arguments to their default values.

Here is how you can create a StorageContext with a Neo4jGraphStore:

Plain Text

from llama_index.core import StorageContext
from llama_index.graph_stores.neo4j import Neo4jGraphStore

graph_store = Neo4jGraphStore(
    username="neo4j",
    password="1234",
    url="bolt://localhost:1234",
    database="neo4j",
    node_label="XXX"
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

This code is based on the examples provided in the knowledge sources here () and here ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ddkatz123

@kapa.ai Once I set up this graph_store and do the following:
storage_context = StorageContext.from_defaults(graph_store=graph_store)
retriever = KnowledgeGraphRAGRetriever(
storage_context=storage_context,
retriever_mode="keyword", )

How do I perform a query of the graph_store?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
To perform a query on the graph_store, you would typically use a KnowledgeGraphQueryEngine or a KnowledgeGraphRAGQueryEngine. However, based on the knowledge sources provided, there isn't a direct example of querying the graph_store using the KnowledgeGraphRAGRetriever.

Here's an example of how you might query a KnowledgeGraphIndex using a KnowledgeGraphQueryEngine:

Plain Text

from llama_index.core import KnowledgeGraphIndex

# Assuming you have documents and a storage_context
index = KnowledgeGraphIndex.from_documents(documents, storage_context=storage_context)

# Create a query engine
query_engine = index.as_query_engine(include_text=True, response_mode="tree_summarize")

# Perform a query
response = query_engine.query("Your query here")

This code is based on the examples provided in the knowledge sources here () and here ().

Please note that the exact method to query may vary depending on the specific setup and requirements of your project. For more detailed information, you may want to refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ddkatz123

@kapa.ai But I don't have any documents. I just have my existing neo4j graph store, which I want to query. It's now in my storage_context, so how can I index and query the graph store without involving "documents"?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
Based on the knowledge sources provided, there isn't a direct example of querying a graph_store without first creating a KnowledgeGraphIndex from documents. The examples provided involve creating a KnowledgeGraphIndex from documents, and then using a KnowledgeGraphQueryEngine to query the index.

However, if you have an existing Neo4j graph store, you can use the query_graph_db method of the Neo4jQueryToolSpec to directly query the Neo4j database. Here's an example:

Plain Text

from llama_index.tools.neo4j import Neo4jQueryToolSpec

# Assuming you have a Neo4jGraphStore instance named graph_store
query_tool = Neo4jQueryToolSpec(graph_store)

# Perform a query
query = "Your Cypher query here"
params = {}  # Optional parameters for your query
results = query_tool.query_graph_db(query, params)

This code is based on the example provided in the knowledge source here ().

Please note that the exact method to query may vary depending on the specific setup and requirements of your project. For more detailed information, you may want to refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Add a reply