Please stand by @Callam as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
The Property Graph Index in LlamaIndex uses a query engine that can be created using the
as_query_engine
method. This query engine can be customized to include text in the results and can be used to query the index. Here is an example of how to create and use the query engine:
query_engine = index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))
This code creates a query engine that includes the original source text in the results. It then uses the
query
method to query the index with a specific question and prints the response.
For more details on customizing retrieval and querying, you can refer to the
documentation ().
In addition to the default query engine, you can also create a custom query engine or a Retriever Query Engine. More information on these can be found in the
module guide on Indexes ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai How does the default query engine work for the property graph index?
Let me check the knowledge base for you. One moment! :books:
----------------------
The default query engine for the Property Graph Index in LlamaIndex works by using one or more sub-retrievers to query the index and combine the results. The process can be thought of as selecting nodes and traversing from those nodes.
By default, two types of retrieval are used in unison:
- Synonym/keyword expansion: This uses the LLM (Language Learning Model) to generate synonyms and keywords from the query.
- Vector retrieval: This uses embeddings to find nodes in your graph.
Once nodes are found, you can either:
- Return the paths adjacent to the selected nodes (i.e., triples)
- Return the paths + the original source text of the chunk (if available)
Here is an example of how to use the default query engine:
query_engine = index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))
In this example,
include_text=True
means that the original source text of the chunk will be returned along with the paths. The
query
method is then used to query the index with a specific question, and the response is printed.
For more details on customizing retrieval and querying, you can refer to the
documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai how does the Synonym/keyword expansion work with the knowledge/property graph?
Let me check the knowledge base for you. One moment! :books:
----------------------
The Synonym/Keyword expansion in LlamaIndex works by using a function to expand synonyms or similar expressions for keywords. This is part of the
KnowledgeGraphRAGRetriever
class in LlamaIndex.
Here is a brief overview of how it works:
- The
_aexpand_synonyms
method is used to expand synonyms or similar expressions for keywords. This method uses the _aprocess_entities
method, which takes in several parameters including the keywords, a function to expand synonyms (_synonym_expand_fn
), a template for expanding synonyms (_synonym_expand_template
), a policy for expanding synonyms (_synonym_expand_policy
), and the maximum number of synonyms (_max_synonyms
).
- The
_get_entities
method is used to get entities from a query string. This method first uses the _process_entities
method to extract entities from the query string. It then uses the _expand_synonyms
method to expand synonyms for the extracted entities. The result is a list of entities and their expanded synonyms.
Here is a snippet of the relevant code:
def _expand_synonyms(self, keywords: List[str]) -> List[str]:
"""Expand synonyms or similar expressions for keywords."""
return self._process_entities(
str(keywords),
self._synonym_expand_fn,
self._synonym_expand_template,
self._synonym_expand_policy,
self._max_synonyms,
"SYNONYMS:",
)
def _get_entities(self, query_str: str) -> List[str]:
"""Get entities from query string."""
entities = self._process_entities(
query_str,
self._entity_extract_fn,
self._entity_extract_template,
self._entity_extract_policy,
self._max_entities,
"KEYWORDS:",
)
expanded_entities = self._expand_synonyms(entities)
return list(set(entities) | set(expanded_entities))
For more details, you can refer to the documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@Callam it uses the same query engine as all other indexes, a retriever query engine
The real meat and potatoes is in the retriever
@kapa.ai why does the code below fail?
query_engine = index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))
I see I am confused about what the synonyms are used to search? Are they used a keywords to search the graph directly or are they embedded also?
@kapa.ai why does the code below fail?
query_engine = index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))
They synoynyms/keywrods are used to find matching entities
From there, the triple is returned, or the text chunk where that triple came from
emeddings are used in a seperate VectorContextRetriever
By default, both are used
its going to be hard to answer this without a traceback imo
the weird part about the failure is that it fails on some questions and not others?
Getting the stack trace rn
ah, small bug it seems, not properly handling nonetype here
Did you create the graph using from_documents() ? Seems its failing to find the source node π€
Maybe this is an edge case where the synoyms/keywords dont find any matches?
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
Generate index from documents
index = PropertyGraphIndex.from_documents(
documents,
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
property_graph_store=graph_store,
show_progress=True,
)
load from existing graph/vector store
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
embed_kg_nodes=True,
)
print(index)
the code is basically exactly out the example repo
Hi @Logan M I am working on this with callam, thank you for your help!
I think this could be the issue, I can get this error to be thrown simply by toggling include_text to True in the following code. Why would include_text=False cause this error to stop?
query_engine = index.as_query_engine(
include_text=False,
)
The error here is when it tries to include the source text
So if you turn that feature off, then there's no issue π
Should be fixable with a quick pr (assuming you are already using the latest version of llama-index-core and llama-index-graph-stores-neo4j
@Logan M Thank you, I think I am on the latest version, Ill check.
For more context here is the logger output leading up to the error.
Name: llama-index-core
Version: 0.10.41
Name: llama-index-graph-stores-neo4j
Version: 0.2.0
just updated llama-index-graph-stores-neo4j to Version: 0.2.2
and can confirm the issue still happens
Great, how long to do you estimate the pr will take?
fastest hands in the west
can confirm adding a check
response = self.structured_query(cypher_statement, param_map=params)
if response is None:
print("No data returned from the query.")
return []
in the get method fixes the issue
thanks for the fast help!
Yea I managed to reproduce the issue with graph_store.get(ids=[])
Just merged. New version of the neo4j graph store package should be published soon automatically
@Logan M is there any functionality in Llama Index for entity resolution? Like if I have one Schema Extraction pullout a person named: John Smith and another schema extraction pulls out a John M. Smith and they work at the same company went to the same school, got the same degree. Is there any methods for combining those entities?
If not in Llama Index is there any other tools you have looked at?
not quite in llama-index, but definitiely could be added as an additional kg module. Its honestly a super tricky area with a lot of research, but a lack of actual production-ready stuff
One approach I had in mind but never fully validated, was clustering kg nodes and then getting the LLM to specify if any are referring to the same thing
I think that makes sense. Honestly kinda what I was thinking as well. Clustering using embeddings and then having an LLM combine information or prioitize via source
any direction on where to look research wise?
Probably terms like "entity resolution" and "entity linking" will get you where you need to go (I usually go for google scholar or paperswithcode)
ok cool I wasn't sure if there were some orgs you liked working on it