2) when using a lllamaindex doc loader I

At a glance

The community member is using a lllamaindex doc loader and wants to see which files loaded or did not load, as well as the reason for any skipped files. They used an .md loader in a directory of sub-directories and found that only 80% of the files were loaded. Another community member suggests checking the document.metadata of each loaded document object for more details on what was skipped. The community members also discuss the behavior when files have the same name in different sub-directories, and one suggests using the recursive=True option in the SimpleDirectoryReader method.

ddean

2) when using a lllamaindex doc loader I want to see what files loaded or did not load. What log settings etc is best for this. For example I used an .md loader in a directory of sub directories and it loaded 80% I need to know what skipped and why

6 comments

LLogan M

it will always load everything, and print things that were skipped. You can check the document.metadata of each loaded document object for more details

ddean

What if each file has same name in different sub directories? I did find 120 but parsed 80 saw no skipped ones listed

LLogan M

it has the full path

LLogan M

Plain Text

>>> from llama_index.core import SimpleDirectoryReader
>>> documents = SimpleDirectoryReader("./docs/docs/examples/data/paul_graham").load_data()
>>> documents[0].metadata
{'file_path': '/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-04-16', 'last_modified_date': '2024-04-16'}
>>>

ddean

I was using .MD loader from hub because even fewer seen by SimpleDirectoryReader I will try again

MMuhammad Haseeb

try recursive=True, in simpledirectoryreader method

Add a reply

Find answers from the community

2) when using a lllamaindex doc loader I