Find answers from the community

Updated 3 months ago

pdf page metadata

This is probably such a simple question and the answer is probably written someone on the Docs page, but I could not find it. How do I preserve a pdf page number for a long pdf , so that when getting vector search (or any other) results, it shows an excerpt + a page number? Thank you
b
M
5 comments
hello friend
Plain Text
 with open(path, 'rb') as f:
         pdf = PdfReader(f)
         print("Metadata: ", pdf.metadata)
         for page in pdf.pages:
           documents.append(Document(text=page.extract_text(), metadata={page_number: pageNumber}))
thanks! so , i read with PdfReader first, and this code will then add page numbers to the chunks of extracted texts?
ya you'd have to do like for key, page in pdf.pages:
or how ever you do a for loop in python πŸ™‚
Add a reply
Sign up and join the conversation on Discord