Find answers from the community

Updated last year

I am building a query engine with

At a glance

The community member is building a query engine with LlamaIndex, based on a large database of PDFs, and would like to be able to retrieve the page number with the sources. The comments suggest that the community member can use metadata to achieve this. One community member advises defining the metadata when building the nodes, and then printing the metadata for the source nodes after getting the response. Another community member confirms that the PDF reader should automatically split by page and put the page number in the metadata, but if not, the community member may need to write a custom loader to set the metadata as desired.

I am building a query engine with LlamaIndex, based on a large database of PDFs, and I would like to be able to retrieve the page number with the sources @Logan M
r
B
L
4 comments
Hello, you can use the metadata. You should define the metadata when you build the nodes. Later you can just put after you get the response:

Show cited passages that were used to construct the response.
for node in streaming_response.source_nodes:
print(f"Metadata: {node.metadata}")
yes, I am asking with reader should i use to get page number which i can add to metadata
I think the pdf reader automatically splits by page and puts the page number in the metadata

Otherwise, you might have to write your own little loader to set the Metadata how you want
okay okay thanks, yeah found that
Add a reply
Sign up and join the conversation on Discord