Find answers from the community

Updated 2 months ago

I'm getting some weird behaviour from

I'm getting some weird behaviour from SimpleDirectoryReader() with llamaparse and wondering if it's intentional. When I load just one file I am ending up with multiple document objects.

Plain Text
parser = LlamaParse(
    result_type="markdown",
    verbose=True,
)
file_extractor = {".pdf": parser}
document = SimpleDirectoryReader(
  input_files=[pdf_path], # pdf_path is ONE file path. ie. './easy_data/example_file.pdf'
  file_extractor=file_extractor,
  filename_as_id=True,
).load_data(show_progress=True)

however, when I run len(document) i am getting a number > 1, which doesn't make sense. Any ideas what's going on?
L
3 comments
split_by_page defaults to true
LlamaParse(..., split_by_page=False)
will avoid that
Add a reply
Sign up and join the conversation on Discord