Find answers from the community

Updated 8 months ago

what is the best local pdf parsing python library or technology, if i cant send a pdf to l

@kapa.ai what is the best local pdf parsing python library or technology, if i cant send a pdf to llama parse
k
m
3 comments
Please stand by @Adam Hunter as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
If you're looking for a local PDF parsing library in Python, you might want to consider using PyMuPDF. It's a Python binding to the PDF processing library MuPDF. You can use it to read PDF, XPS, and E-book documents. Here's a simple example of how you can use it with LlamaIndex:

Plain Text
from pathlib import Path
from llama_index.readers.file import PyMuPDFReader

loader = PyMuPDFReader()
documents = loader.load(file_path="./data/your_file.pdf")


Please note that while PyMuPDF is a powerful library, it may not handle complex PDFs as effectively as LlamaParse. If you're dealing with complex PDFs and can't use LlamaParse, you might need to explore other options or build custom solutions.

You can install PyMuPDF using pip:

Plain Text
%pip install pymupdf


Please refer to the LlamaIndex documentation () for more details.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
As far as people are saying: Llamaparse
Add a reply
Sign up and join the conversation on Discord