Find answers from the community

Updated 2 months ago

Hi! Can you help me to choose a proper

Hi! Can you help me to choose a proper class/loader, to ingest pdf file(s) sent from frontend?

Plain Text
from fastapi import File, UploadFile
from typing import List

async def upload_files(pdf_files: List[UploadFile] = File(...)):
  for pdf_file in pdf_files:
      # what loader do I need to use? Do I need to first read the text contents myself and create a list of Documents?
L
p
2 comments
I forget what format these uploaded files are in -- to use in an existing pdf loader from llama-hub, they might have to be on disk to work πŸ€”

Probably I would use unstructured or pdf-miner/pdf-plumber directly and load into document objects
@Logan M I think the content itself is a byte array, because to read the actual text, I need to call:

Plain Text
content = await pdf_file.read()


or something like that ( I don't remember too)


What I mean is: do I have to parse them manually using pdf reader library (e.g. pdfplumber, PyPdf etc.) and create a list of Documents myself or is there a loader(s) which can do it automatically?

Seems like the latter option is correct πŸ™‚
Add a reply
Sign up and join the conversation on Discord