Find answers from the community

Updated 3 months ago

JSON

Hello everyone,

I've been working on integrating LlamaIndex with JSON files that have been extracted from PDFs. For context, each text element from the PDF has been chunked, assigned an ID, and given a parent ID to refer to its parent element, making the text structure clear and hierarchical.

However, I've encountered issues whether I use the JsonReader from LlamaHub, the VectorStoreIndex for index building, and build a general query_engine or use the JsonQueryEngine. It seems LlamaIndex struggles to interpret the 'id' and 'parentId' even when the schema is provided. Additionally, it doesn't seem to handle more flexible queries like summarizing all documents effectively.

Here's a sample of the JSON structure I'm working with.
Has anyone experienced this or have suggestions on how to improve the integration? Thanks in advance!
Attachment
image.png
E
L
3 comments
can you share more details about the issues you’re having? the error or stacktrace
There is an error when I am querying "what's the file about " with JsonQueryEngine :
82 def p_error(self, t):
---> 83 raise JsonPathParserError('Parse error at %s:%s near token %s (%s)'
84 % (t.lineno, t.col, t.value, t.type))
85

JsonPathParserError: Parse error at 1:4 near token task (ID)

And when I was using general query engine with "What's the parent text of C.2", I always got the response like "The information is not provided."
What can I do to make sure the information of ID and parentID can be learned?
Add a reply
Sign up and join the conversation on Discord