Find answers from the community

Updated 2 years ago

Pdf error

At a glance

The post is a request for advice from community member Logan M about an issue. The comments suggest that the community members are having trouble loading PDF files on a remote server, but the files load fine locally. They discuss potential solutions, such as using a different PDF parser package or loading the PDF directly and converting it to a document object. One community member provides example code for creating a document object from a PDF text string. However, there is no explicitly marked answer to the original question.

Hi @Logan M can you drop some wisdom abt this 😩 please
L
S
14 comments
Lol yea I saw this earlier today but tbh I have no idea

You are running on a remote sever, and the pdf won't load on the server but it loads locally?
or is there a version of llamaindex that uses different pdf parser package? I believe theres a version where it uses PYPDF2?
and it loads txt file just fine. but pdf file, its always empty file
You could always load the pdf yourself with a pdf library of your choice, and the convert to a document object
yeah that helps
how to convert to document object?
bcs i can load it just fine with python reader
Plain Text
from llama_index import Document

document = Document("my pdf text string", doc_id="optional doc id", extra_info={"optional": "info dict"})
You can shove the entire pdf into one document, or split it any way you like and create many document objects
The doc id and extra info are optional (I think I made that clear lol but just making sure)
Add a reply
Sign up and join the conversation on Discord