LlamaIndex

Log inLog into community

Find answers from the community

Updated 2 months ago

Error

Error

At a glance

The post is about a community member facing an error while using the GraphRAG Colab Notebook. The comments discuss various attempts to fix the issue, including installing the latest version of llama-index-core and restarting the notebook. Community members also discuss the performance of creating a Neo4j graph, with one member noting that it took 8 minutes to create a graph from two PDF documents and suggesting ways to improve the process for a larger dataset. Another community member suggests using local language and embedding models instead of API-hosted models to avoid rate limits, but this leads to a "ReadTimeout" error. The discussion explores different approaches, such as using a less demanding extractor or a larger language model, and the community members provide suggestions and insights to help resolve the issues.

Useful resources

·

Anyone faced this error while using GraphRAG Colab Notebook(https://docs.llamaindex.ai/en/stable/examples/property_graph/property_graph_neo4j/) ?

Attachments

L

J

29 comments

Oh I just fixed this

pip install -U llama-index-core

It still a hit or miss. Some time it worked and other times it threw same below error. But this time it got stuck at 59% after restarting it after getting stuck at 22%. This is after running "pip install -U llama-index-core". Thanks.

Attachment

Did you restart the notebook? The latest versions of llama-index-core completely removed that assert, so you are still running old code

It's working now but i have to load packages as below(remove llama-index from your Notebook pip command):

Attachment

Few points as I am closing this thread: A) Why it takes such a long time to create Neo4j graph? I tested on two PDF documents (300 + 40 pages) and it took 8 mins. Any recommended best way to create graph from 2000 scientific papers(35 millions tokens) database?

B) I am making this tool to be used offline on customer site. Can i create graph by using language model and embedding model from OpenAI and then infer the graph by using a locally hosted language and embedding model on customer site?

Its a lot of llm calls. 8mins sounds resonable for data that size, its making llm calls for every node (which likely works out to every page). It can only run so much in parallel before you hit rate limits. There is a way to increase the concurrency but tbh I wouldn't recommend touching it too much

Plain Text

kg_extractors=[
    SchemaLLMPathExtractor(
        llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0),
        num_workers=4,  # DEFAULT IS 4
    )
],

You can change the LLM at any time, but for embeddings, you need to use the same model during creation/indexing and querying.

Thanks for the above info. Given the rate limits related to all API hosted models. I was thinking to use local language and embedding models. When i use them it throws following "ReadTimeolut" error:

Attachment

maybe need a longer request_timeout? I'm sure the full stack trace shows what module is timing out

I just cant see it in the screenshot

I have increased request_timeout by 200 and it does go through 100% now but it shows empty string when i query the resulting graph-index. And nothing prints in retrieved nodes.

Attachments

Same Empty response with PropertyGraphIndex.from_existing().

Attachment

Apart from your next suggestions, how to do "full stack trace" to know which module is timing out?

to me this says no graph relations were retrieved (could be possible, especially with open-source models to build the index, I suggest not using the schema extractor with small open-source models)

So just remove schema extractor from below and replace it with just "llm=llm"

Attachment

I think so, the default is a lot less demanding

okay it looks like I have two ways to go from here: A) scrap schema extractor to use small LLMs to create graph index but it's inference-answer quality will not be as good as next option; B) Or use bigger LLM, ideally use one of the bests i.e. OpenAI, Anthropic etc. to create the index and for inference I can switch back to small local LLMs. No matter which option i picks, I have to use same local embedding model for index creation and inference since A) I can't use API embedding model during inference and B) both steps (index creation + inference) must use the same embedding model. Please correct me if i got anything wrong, thanks.

yea that about sums it up

got it. One last thing as I'm learning LlamaIndex, how to see full stack trace when you said "I'm sure the full stack trace shows what module is timing out I just cant see it in the screenshot"

Your screenshot just cut off the bottom of the trace lol

Sometimes notebooks also truncate the middle of the traceback (very annoying, you'll see three dots somewhere in the middle). Usually there's a button at the bottom of the log saying "view full" or "view as scrollable element"

oh i see what you meant, i thought there was piece of code i was missing to produce full traceback. thanks.

I changed the default OpenAI model to O1 and i ran into following error:

Attachments

o1 doesn't support function calling. Use the Dynamic llm extractor (but also, using o1 for this is mega overkill)

okay. I am running into a new error, i guess I shall contact neo4j devs?

maybe? If I had to guess, some name or entity or relation got extracted as a blank string

sigh

hot take: I dislike graphs, and I don't think the effort and compute is worth it (in a majority of use-cases) 😁 The hype-sphere really over promises and under-delivers here

anyways, I can probably fix this by filtering out blanks in the extractor source code

Add a reply

Sign up and join the conversation on Discord