Find answers from the community

Updated 9 months ago

KnowledgeGraphIndex in LLamaindex seems

KnowledgeGraphIndex in LLamaindex seems to be broken. Here is the complete code to generate a knowledge graph:

Plain Text
from llama_index.core import SummaryIndex, Document

index = SummaryIndex([])
text_chunks = ["cars are red", "cars are big", "elephants are red"]

doc_chunks = []
for i, text in enumerate(text_chunks):
    doc = Document(text=text, id_=f"doc_id_{i}")
    doc_chunks.append(doc)

kg_index = KnowledgeGraphIndex.from_documents(documents=doc_chunks,
                                            storage_context=storage_context,
                                            max_triplets_per_chunk=3,
                                            space_name=space_name,
                                            edge_types=edge_types,
                                            rel_prop_names=rel_prop_names,
                                            tags=tags,
                                            llm=llm,
                                            embed_model=embed_model,
                                            include_embeddings=True)

import networkx as nx
G = kg_index.get_networkx_graph()
nx.draw(G, with_labels=True, font_weight='bold')


And I get this:
1
b
L
d
34 comments
why are there so many nodes with just one character inside, and even an empty one ?
probably its not being generated/parsed properly

If you explored the graph in nebula (I think youa re using nebula?) it would probably make more sense
is there an easy way to plot it directly from nebula?
doesn't nebula have a GUI?
I thought it had some kind of dashboard
yes, no I was asking if there is a function to just plot the graph (from nebula) in the jupyter notebook, but I'll use the dash, thanks
Thing is I am somewhat sure there is something wrong with the graph it generates, as when I try building it with a PDF file (about 1 page of wikipedia article about Europe), I get

Plain Text
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[70], line 2
      1 hybrid_query_engine = kg_index.as_query_engine(llm=llm, embed_model=embed_model)
----> 2 response = hybrid_query_engine.query("Tell me about Europe.")
      3 print(response)

File ~/anaconda3/envs/llamaindex10/lib/python3.11/site-packages/llama_index/core/base/base_query_engine.py:40, in BaseQueryEngine.query(self, str_or_query_bundle)
     38 if isinstance(str_or_query_bundle, str):
     39     str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 40 return self._query(str_or_query_bundle)

File ~/anaconda3/envs/llamaindex10/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py:186, in RetrieverQueryEngine._query(self, query_bundle)
    182 """Answer a query."""
    183 with self.callback_manager.event(
    184     CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str}
    185 ) as query_event:
--> 186     nodes = self.retrieve(query_bundle)
    187     response = self._response_synthesizer.synthesize(
    188         query=query_bundle,
    189         nodes=nodes,
    190     )
    192     query_event.on_end(payload={EventPayload.RESPONSE: response})

File ~/anaconda3/envs/llamaindex10/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py:142, in RetrieverQueryEngine.retrieve(self, query_bundle)
...
--> 177 keyword = rel_text.split(",")[2]
    178 if keyword:
    179     keywords.append(keyword.strip(" ()\"'"))

IndexError: list index out of range
The function that builds the visualzation is probably just broken for nebula. I'm pretty confident the actual graph is fine

This is also some bug with include_text=True I think.

This class needs a lot of TLC
I welcome a PR
hmm...I removed the include_text=True, the error is still there
strange, as I was following an example from a medium article
where it seemed to work fine
I guess something broke over time?
I think include_text=True is the default, might have to set it to false
I'll have to explain that to my boss πŸ˜„
yup, tried with False, still the same thing
I really recommend making a PR. Or using neo4j

KGs are lower priority (they are less helpful/scalable IMO, especially in their current state)

Also starting a refactor/addition next week, so these are basically in legacy mode right now
Oh I see, I'll ask about switching to neo4j then. Thanks πŸ™‚
Plain Text
documents = SimpleDirectoryReader("./data").load_data()

graph_store = Neo4jGraphStore(username='neo4j',
                              password='1234',
                              url="neo4j://localhost:8088",
                              database="neo4j")

storage_context = StorageContext.from_defaults(graph_store=graph_store)

index = KnowledgeGraphIndex.from_documents(documents,
    storage_context=storage_context, max_triplets_per_chunk=2,
    include_embeddings=True
)

query_engine = index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=5,
)

response = query_engine.query("Tell me more about Europe")

print(response)
Plain Text
IndexError                                Traceback (most recent call last)
Cell In[9], line 15
      3 index = KnowledgeGraphIndex.from_documents(documents,
      4     storage_context=storage_context, max_triplets_per_chunk=2,
      5     include_embeddings=True
      6 )
      8 query_engine = index.as_query_engine(
      9     include_text=True,
     10     response_mode="tree_summarize",
     11     embedding_mode="hybrid",
     12     similarity_top_k=5,
     13 )
---> 15 response = query_engine.query("Tell me more about Europe")
     17 print(response)

File ~/anaconda3/envs/llamaindex10/lib/python3.11/site-packages/llama_index/core/base/base_query_engine.py:40, in BaseQueryEngine.query(self, str_or_query_bundle)
     38 if isinstance(str_or_query_bundle, str):
     39     str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 40 return self._query(str_or_query_bundle)

File ~/anaconda3/envs/llamaindex10/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py:186, in RetrieverQueryEngine._query(self, query_bundle)
    182 """Answer a query."""
    183 with self.callback_manager.event(
    184     CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str}
...
--> 177 keyword = rel_text.split(",")[2]
    178 if keyword:
    179     keywords.append(keyword.strip(" ()\"'"))

IndexError: list index out of range
@Logan M the neo4j implementation gives the same eror as the nebula graph one above. It seems there is something broken in the BaseQueryEngine or in the RetrieverQueryEngine
The traceback is cut off, but it's some error in the knowledge graph retriever
But is this broken in Llamaindex?
@Logan M I found the error. In the llama_index / core / indices / knowledge_graph/retrievers.py

Plain Text
Class KGTableRetriever(BaseRetriever):
 [...]
  def _extract_rel_text_keywords(self, rel_texts: List[str]) -> List[str]:
        """Find the keywords for given rel text triplets."""
        keywords = []
        for rel_text in rel_texts:
            keyword = rel_text.split(",")[0]
            if keyword:
                keywords.append(keyword.strip("(\"'"))
            # Return the Object as well
            keyword = rel_text.split(",")[1] # was 2            <----------------------------- For some reason this index was set to 2. 
            if keyword:
                keywords.append(keyword.strip(" ()\"'"))
        return keywords 

I changed that index to 1 and it now works.
@kapa.ai When I create a neo4j graph_store and then create a storage_context, I get an error:
from llama_index.graph_stores.neo4j import Neo4jGraphStore

graph_store = Neo4jGraphStore(
username="neo4j",
password="1234",
url="bolt://localhost:1234",
database="neo4j",
node_label="XXX"
)
storage_context = StorageContext(graph_store=graph_store)
TypeError: StorageContext.init() missing 3 required positional arguments: 'docstore', 'index_store', and 'vector_stores'
Let me check the knowledge base for you. One moment! :books:
----------------------
The error you're encountering is because the StorageContext requires three arguments: docstore, index_store, and vector_stores, in addition to graph_store. In the examples provided in the knowledge sources, the StorageContext is created using the from_defaults method, which sets these three arguments to their default values.

Here is how you can create a StorageContext with a Neo4jGraphStore:

Plain Text
from llama_index.core import StorageContext
from llama_index.graph_stores.neo4j import Neo4jGraphStore

graph_store = Neo4jGraphStore(
    username="neo4j",
    password="1234",
    url="bolt://localhost:1234",
    database="neo4j",
    node_label="XXX"
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)


This code is based on the examples provided in the knowledge sources here () and here ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai Once I set up this graph_store and do the following:
storage_context = StorageContext.from_defaults(graph_store=graph_store)
retriever = KnowledgeGraphRAGRetriever(
storage_context=storage_context,
retriever_mode="keyword", )

How do I perform a query of the graph_store?
Let me check the knowledge base for you. One moment! :books:
----------------------
To perform a query on the graph_store, you would typically use a KnowledgeGraphQueryEngine or a KnowledgeGraphRAGQueryEngine. However, based on the knowledge sources provided, there isn't a direct example of querying the graph_store using the KnowledgeGraphRAGRetriever.

Here's an example of how you might query a KnowledgeGraphIndex using a KnowledgeGraphQueryEngine:

Plain Text
from llama_index.core import KnowledgeGraphIndex

# Assuming you have documents and a storage_context
index = KnowledgeGraphIndex.from_documents(documents, storage_context=storage_context)

# Create a query engine
query_engine = index.as_query_engine(include_text=True, response_mode="tree_summarize")

# Perform a query
response = query_engine.query("Your query here")


This code is based on the examples provided in the knowledge sources here () and here ().

Please note that the exact method to query may vary depending on the specific setup and requirements of your project. For more detailed information, you may want to refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai But I don't have any documents. I just have my existing neo4j graph store, which I want to query. It's now in my storage_context, so how can I index and query the graph store without involving "documents"?
Let me check the knowledge base for you. One moment! :books:
----------------------
Based on the knowledge sources provided, there isn't a direct example of querying a graph_store without first creating a KnowledgeGraphIndex from documents. The examples provided involve creating a KnowledgeGraphIndex from documents, and then using a KnowledgeGraphQueryEngine to query the index.

However, if you have an existing Neo4j graph store, you can use the query_graph_db method of the Neo4jQueryToolSpec to directly query the Neo4j database. Here's an example:

Plain Text
from llama_index.tools.neo4j import Neo4jQueryToolSpec

# Assuming you have a Neo4jGraphStore instance named graph_store
query_tool = Neo4jQueryToolSpec(graph_store)

# Perform a query
query = "Your Cypher query here"
params = {}  # Optional parameters for your query
results = query_tool.query_graph_db(query, params)


This code is based on the example provided in the knowledge source here ().

Please note that the exact method to query may vary depending on the specific setup and requirements of your project. For more detailed information, you may want to refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord