Find answers from the community

Updated 2 years ago

does it use pdftotext tool under the

At a glance

The post asks if the tool uses the pdftotext tool under the hood. The community members discuss the use of the SimpleDirectoryReader to extract text from PDF files, and whether it relies on other libraries like pdftotext or xpdf. Some members mention that the tool does lazy importing of libraries like pdftotext, so users would need to have it installed. There are also questions about the release timeline and plans for features like backlinks to the original PDF sections. However, there is no explicitly marked answer to the original question.

Useful resources
does it use pdftotext tool under the hood?
j
H
7 comments
oh you just use SimpleDirectoryReader over a folder of .pdf files, it'll extract the text
SDR is standalone code? or it gets functionality from other libraries/code like pdftotext or https://www.xpdfreader.com/https://www.xpdfreader.com/?
it does lazy importing of libraries like pdftotext, so if you do have .pdf files you'll need to have pdftotext installed
i think this is not yet released, when is tentative plan to release reader?
any plan to create/store backlink so that one can easily traverse/go back to text to the relevant pdf section.?
@Hioko it should be there! what version of gpt index are you on? again just use SimpleDirectoryReader over your .pdf files
i m not using it yet. I will check it out on google colab. i'm not sure how much space it(gpt-index) needs to work on local computer
Add a reply
Sign up and join the conversation on Discord