Nodes

At a glance

The community member is experiencing a performance issue when creating a Neo4jPropertyGraphStore from an existing graph store with 64,322 nodes. They have tried increasing memory and using periodic.iterative, but the issue persists, particularly when running the "CALL apoc.meta.graphSample()" query. The community members discuss potential solutions, such as disabling the text2cypher functionality, setting refresh_schema to False, and removing the schema count tracking functionality. However, there is no definitive answer provided, and the community members continue to explore options to improve the performance.

Useful resources

kkyang

Hi!

Right now creating the Neo4jPropertyGraphStore takes insanely long every time I start it up from an existing graph store. I only have 64322 nodes in the store right now. Is there some way I can speed this up? Should I simplify the schema since I generated using the DynamicLLMPathExtractor? Is there a way to give it more memory/cores?

I manage to debug the problem down to

Plain Text

schema_counts = self.structured_query(
            "CALL apoc.meta.graphSample()

22 comments

LLogan M

Can you clarify what you mean by "starting from scratch" ?

Extracting info from 64k nodes will indeed take a long time, which is why you'd only want to do it once and then reuse the graph store

kkyang

Well I have the nodes extracted. I'm just trying ot make the graph store.

kkyang

graph_store = Neo4jPropertyGraphStore(
username="", password="", url="bolt://localhost:7687", refresh_schema=True
)

kkyang

I've used the same exact code to create the graph store to build the property graph index.

LLogan M

Hmm.. I guess it's just neo4j being slow then? Probably schema generation is pretty slow

If you don't need text2cypher, you can set that to false i think (it runs a large query otherwise)

kkyang

There isn't an option for text2cypher.

kkyang

Everything before running schema_counts runs in a decent time.

But when it hits

Plain Text

"CALL apoc.meta.graphSample() YIELD nodes, relationships "
            "RETURN nodes, [rel in relationships | {name:apoc.any.property"
            "(rel, 'type'), count: apoc.any.property(rel, 'count')}]"
            " AS relationships"

it just freezes.

kkyang

I've allocatted the maximum amount of memory I can (it's now using it on the activity monitor) and I've also tried periodic.iterative.

kkyang

Similar issue found in the github issues: https://github.com/run-llama/llama_index/issues/16204

kkyang

Is there a point to keeping track of the schema counts if we don't run an enhanced schema? I don't see a way to run it faster because the statistical analysis of apoc.meta.graphSample can't really be sped up. My only solution now is to see if it's avoidable.

LLogan M

I'm pretty sure you can just do refresh_schema=False no?

LLogan M

That will avoid the expensive calls

kkyang

Would that remove the functionality of structure queries.

LLogan M

nope

LLogan M

its only used for text2sql

LLogan M

afaik

kkyang

Lovely, thank you for the help!

kkyang

@Logan M hate to bring this up again but it looks like just doing refresh_schema doesn't work because supports_structured_query is always True for a Neo4j and the code when inserting new nodes will always check that variable and call get_schema(refresh=True)

kkyang

I personally think it's safe to set allows_structured_queries to be False since the Text2Cypher retrievers do have a check against it. Let me know if you want me to make a PR for making this a parameter or you think leaving it as is should be the intended behavior.

I'm only concerned about this because this is a very time and perhaps computation resource heavy call. It happens at the end of the insertion logic so insertions shouldn't be impacted but can be confusing why code is stalling or may be blocking distributed calls from ending.

LLogan M

yea fair. I'm not sure about the best way to expose this. Maybe its removed completely from the code and the user has to manually call refresh schema when needed (a little breaking of a change sadly)

kkyang

Yeah. That sounds like too major of a breaking change. I can just add some documentation at this point so if anyone is also stuck on this they can manually change the variable without having to dig.

LLogan M

yea thats probably better

Add a reply

Find answers from the community

Nodes