Find answers from the community

Updated 3 months ago

Pdf reader

I'm upgrading an app that uses llama-index==0.5.40 to llama-index==0.6.11. Using the example below, the entire contents of a 3 page test PDF are added to the index in the 0.5.4 version, but only the first page in 0.6.11. I've looked through the docs but any hints or suggestions are welcome. With the upgrade I also updated PyPDF2==3.0.1. to pypdf==3.9.0
Plain Text
index = GPTVectorStoreIndex([], service_context=service_context)
document = SimpleDirectoryReader(input_files=[doc_text]).load_data()[0]
index.insert(document)
L
e
2 comments
The code for the pdf reader is quite simple...
https://github.com/jerryjliu/llama_index/blob/4e29d1e7a2c55a031bebd1e69c51aebfa2cfdd61/llama_index/readers/file/docs_reader.py#L16

Maybe num_pages is somehow not correct? You could use this code to test outside of llama index to see where the issue is with your pdf
Thank you @Logan M
Add a reply
Sign up and join the conversation on Discord