Nodes

At a glance

The community member asked if there is an easy way to get the start and end character index of a node in a document. Another community member responded that if they used a llamaindex text splitter, the start and end character indices should be available as node.start_char_idx and node.end_char_idx.

The original poster then mentioned they are trying to render a PDF and jump between sections based on the node indices, but are having issues with PDF.js and treating the PDF as HTML. The community members suggested using a custom PDF viewer they had developed for a previous project, and provided a link to the NPM package for it. However, they were unsure if there was a specific hook to scroll to locations in the PDF.

The final comment suggested that it might be better for the original poster to write their own PDF parser, as that would allow them to get the x,y coordinates of elements and map them more appropriately to the original document, rather than relying on character indices which may not line up well.

Useful resources

mmaybe goats dont exist

Hey, is there an easy way, from the node, to get the Start / End character index in the document?

8 comments

LLogan M

Assuming you used a llamaindex text splitter, it should already be there

node.start/end_char_idx

mmaybe goats dont exist

yup i see it, wasnt in the BaseNode type i was using so was a bit confused.

Thanks Logan

BTW:
I am trying to render a PDF and be able to jump between sections based on the node idxs but PDFjs isnt working well, and when i tried to just treat it as HTML it kinda messed up my analysis on the nodes.

Do yall have a good way of doing stuff similar to this, or am I just gonna have to duke it out with PDF.js

( I am getting metrics out of pdfs, and want to be able to click the metric and highlight / navigate to the source / sources)

LLogan M

Hmm, for a past poc we made (sec insights) we wrote our own pdf viewer for this

Maybe it'll be helpful for you
https://github.com/run-llama/sec-insights/tree/main/frontend/src/components/pdf-viewer

LLogan M

I thought we were going to publish this as a component, but I guess we never did? 😅

LLogan M

Oh here it is
https://www.npmjs.com/package/@llamaindex/pdf-viewer

LLogan M

Hmm I could have sworn there was some hook to scroll to locations, but now I don't see this

But I know sec insights does that, so it's gotta be somewhere 😅

mmaybe goats dont exist

interesting

mmaybe goats dont exist

It actually might be better for me to write my own pdf PARSER because then i can get x,y coords of stuff and map that more appropriately to the original document, rather than trying to map to character indexes that dont really line up with the reality of the document

Add a reply

Find answers from the community

Nodes