LlamaIndex

Log inLog into community

Find answers from the community

Updated 9 months ago

what query engine does the property graph index use?

what query engine does the property graph index use?

At a glance

The community members discuss the query engine used by the Property Graph Index in LlamaIndex. The default query engine uses a combination of synonym/keyword expansion and vector retrieval to find relevant nodes in the graph and return the associated paths or source text. The community members also discuss an issue with the code where including the source text in the query results causes an error, which is resolved by a pull request. Additionally, the community members discuss the potential for entity resolution functionality in LlamaIndex, noting that it is not currently available but could be added as a module. They suggest using techniques like clustering and LLM-based entity linking as potential approaches.

Useful resources

·

@kapa.ai what query engine does the property graph index use?

1

k

C

L

48 comments

Please stand by @Callam as I search the knowledge base. I normally take about 20 seconds or so. :wave:

----------------------
The Property Graph Index in LlamaIndex uses a query engine that can be created using the as_query_engine method. This query engine can be customized to include text in the results and can be used to query the index. Here is an example of how to create and use the query engine:

Plain Text

query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))

This code creates a query engine that includes the original source text in the results. It then uses the query method to query the index with a specific question and prints the response.

For more details on customizing retrieval and querying, you can refer to the documentation ().

In addition to the default query engine, you can also create a custom query engine or a Retriever Query Engine. More information on these can be found in the module guide on Indexes ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

@kapa.ai How does the default query engine work for the property graph index?

Let me check the knowledge base for you. One moment! :books:

----------------------
The default query engine for the Property Graph Index in LlamaIndex works by using one or more sub-retrievers to query the index and combine the results. The process can be thought of as selecting nodes and traversing from those nodes.

By default, two types of retrieval are used in unison:

Synonym/keyword expansion: This uses the LLM (Language Learning Model) to generate synonyms and keywords from the query.
Vector retrieval: This uses embeddings to find nodes in your graph.

Once nodes are found, you can either:

Return the paths adjacent to the selected nodes (i.e., triples)
Return the paths + the original source text of the chunk (if available)

Here is an example of how to use the default query engine:

Plain Text

query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))

In this example, include_text=True means that the original source text of the chunk will be returned along with the paths. The query method is then used to query the index with a specific question, and the response is printed.

For more details on customizing retrieval and querying, you can refer to the documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

@kapa.ai how does the Synonym/keyword expansion work with the knowledge/property graph?

Let me check the knowledge base for you. One moment! :books:

----------------------
The Synonym/Keyword expansion in LlamaIndex works by using a function to expand synonyms or similar expressions for keywords. This is part of the KnowledgeGraphRAGRetriever class in LlamaIndex.

Here is a brief overview of how it works:

The _aexpand_synonyms method is used to expand synonyms or similar expressions for keywords. This method uses the _aprocess_entities method, which takes in several parameters including the keywords, a function to expand synonyms (_synonym_expand_fn), a template for expanding synonyms (_synonym_expand_template), a policy for expanding synonyms (_synonym_expand_policy), and the maximum number of synonyms (_max_synonyms).

The _get_entities method is used to get entities from a query string. This method first uses the _process_entities method to extract entities from the query string. It then uses the _expand_synonyms method to expand synonyms for the extracted entities. The result is a list of entities and their expanded synonyms.

Here is a snippet of the relevant code:

Plain Text

def _expand_synonyms(self, keywords: List[str]) -> List[str]:
    """Expand synonyms or similar expressions for keywords."""
    return self._process_entities(
        str(keywords),
        self._synonym_expand_fn,
        self._synonym_expand_template,
        self._synonym_expand_policy,
        self._max_synonyms,
        "SYNONYMS:",
    )

def _get_entities(self, query_str: str) -> List[str]:
    """Get entities from query string."""
    entities = self._process_entities(
        query_str,
        self._entity_extract_fn,
        self._entity_extract_template,
        self._entity_extract_policy,
        self._max_entities,
        "KEYWORDS:",
    )
    expanded_entities = self._expand_synonyms(entities)
    return list(set(entities) | set(expanded_entities))

For more details, you can refer to the documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

@Callam it uses the same query engine as all other indexes, a retriever query engine

The real meat and potatoes is in the retriever

@kapa.ai why does the code below fail?

query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))

I see I am confused about what the synonyms are used to search? Are they used a keywords to search the graph directly or are they embedded also?

@kapa.ai why does the code below fail?

query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))

They synoynyms/keywrods are used to find matching entities

From there, the triple is returned, or the text chunk where that triple came from

emeddings are used in a seperate VectorContextRetriever

By default, both are used

its going to be hard to answer this without a traceback imo

the weird part about the failure is that it fails on some questions and not others?
Getting the stack trace rn

ah, small bug it seems, not properly handling nonetype here

Did you create the graph using from_documents() ? Seems its failing to find the source node 🤔

Maybe this is an edge case where the synoyms/keywords dont find any matches?

Yes

from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

Generate index from documents

index = PropertyGraphIndex.from_documents(
documents,
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
property_graph_store=graph_store,
show_progress=True,
)

load from existing graph/vector store

index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
embed_kg_nodes=True,
)

print(index)

the code is basically exactly out the example repo

Hi @Logan M I am working on this with callam, thank you for your help!

I think this could be the issue, I can get this error to be thrown simply by toggling include_text to True in the following code. Why would include_text=False cause this error to stop?

query_engine = index.as_query_engine(
include_text=False,
)

The error here is when it tries to include the source text

So if you turn that feature off, then there's no issue 😅

Should be fixable with a quick pr (assuming you are already using the latest version of llama-index-core and llama-index-graph-stores-neo4j

@Logan M Thank you, I think I am on the latest version, Ill check.

For more context here is the logger output leading up to the error.

Name: llama-index-core
Version: 0.10.41

Name: llama-index-graph-stores-neo4j
Version: 0.2.0

just updated llama-index-graph-stores-neo4j to Version: 0.2.2

and can confirm the issue still happens

Great, how long to do you estimate the pr will take?

Probably... 1 hour? Lmao

https://github.com/run-llama/llama_index/pull/13938

fastest hands in the west

can confirm adding a check

response = self.structured_query(cypher_statement, param_map=params)

if response is None:
print("No data returned from the query.")
return []

in the get method fixes the issue

thanks for the fast help!

Yea I managed to reproduce the issue with graph_store.get(ids=[])

Just merged. New version of the neo4j graph store package should be published soon automatically

thanks logan

we pulled the new pr

@Logan M is there any functionality in Llama Index for entity resolution? Like if I have one Schema Extraction pullout a person named: John Smith and another schema extraction pulls out a John M. Smith and they work at the same company went to the same school, got the same degree. Is there any methods for combining those entities?

If not in Llama Index is there any other tools you have looked at?

not quite in llama-index, but definitiely could be added as an additional kg module. Its honestly a super tricky area with a lot of research, but a lack of actual production-ready stuff

One approach I had in mind but never fully validated, was clustering kg nodes and then getting the LLM to specify if any are referring to the same thing

I think that makes sense. Honestly kinda what I was thinking as well. Clustering using embeddings and then having an LLM combine information or prioitize via source

any direction on where to look research wise?

Probably terms like "entity resolution" and "entity linking" will get you where you need to go (I usually go for google scholar or paperswithcode)

ok cool I wasn't sure if there were some orgs you liked working on it

Add a reply

Sign up and join the conversation on Discord