Hello, everyone. I'm looking for some advice on efficient PropertyGraphIndex indexing practices, with the hope that there are ways to speed up the embedding and indexing process.
I'm currently processing a large amount of JSON data, across around 15 separate documents, and using the JSONNodeParser() and parser.get_nodes_from_documents() to extract the JSON objects as Nodes. I'm then using Ollama and OllamaEmbedding models to push them to a Neo4j instance using ProperyGraphIndex. I've had some success on individual smaller files, but it's looking like it'll take many, many, hours to complete the embedding and indexing process for everything. I'm prepared to just "hurry up and wait" if there's no other options, but I figured I'd reach out to the community and see if there are recommendations on ways improve processing times before I just let it run. Any and all feedback is welcomed and appreciated. Thanks!
P.S. If any other info on my current workflow would be helpful, I'm more than willing to provide it!
gotcha. makes sense. I'm running my Ollama instance from a GPU enabled AWS EC2 instance at the moment. Would providing more horsepower to the LLMs make a difference, or is it more likely just the amount of calls taking place that's leading to the amount of time required?
Also, I really appreciate you taking the time to provide input. I'm still new to llama-index, so I'm just looking to learn the right ways to do things.
I think the latest version of ollama allows for running replicas of a model automatically (assuming you have the space I think?) -- not entirely sure how that works though