The community member asked if there is an easy way to get the start and end character index of a node in a document. Another community member responded that if they used a llamaindex text splitter, the start and end character indices should be available as node.start_char_idx and node.end_char_idx.
The original poster then mentioned they are trying to render a PDF and jump between sections based on the node indices, but are having issues with PDF.js and treating the PDF as HTML. The community members suggested using a custom PDF viewer they had developed for a previous project, and provided a link to the NPM package for it. However, they were unsure if there was a specific hook to scroll to locations in the PDF.
The final comment suggested that it might be better for the original poster to write their own PDF parser, as that would allow them to get the x,y coordinates of elements and map them more appropriately to the original document, rather than relying on character indices which may not line up well.
yup i see it, wasnt in the BaseNode type i was using so was a bit confused.
Thanks Logan
BTW: I am trying to render a PDF and be able to jump between sections based on the node idxs but PDFjs isnt working well, and when i tried to just treat it as HTML it kinda messed up my analysis on the nodes.
Do yall have a good way of doing stuff similar to this, or am I just gonna have to duke it out with PDF.js
( I am getting metrics out of pdfs, and want to be able to click the metric and highlight / navigate to the source / sources)
It actually might be better for me to write my own pdf PARSER because then i can get x,y coords of stuff and map that more appropriately to the original document, rather than trying to map to character indexes that dont really line up with the reality of the document