Find answers from the community

Updated 6 months ago

2) when using a lllamaindex doc loader I

At a glance

The community member is using a lllamaindex doc loader and wants to see which files loaded or did not load, as well as the reason for any skipped files. They used an .md loader in a directory of sub-directories and found that only 80% of the files were loaded. Another community member suggests checking the document.metadata of each loaded document object for more details on what was skipped. The community members also discuss the behavior when files have the same name in different sub-directories, and one suggests using the recursive=True option in the SimpleDirectoryReader method.

2) when using a lllamaindex doc loader I want to see what files loaded or did not load. What log settings etc is best for this. For example I used an .md loader in a directory of sub directories and it loaded 80% I need to know what skipped and why
L
d
M
6 comments
it will always load everything, and print things that were skipped. You can check the document.metadata of each loaded document object for more details
What if each file has same name in different sub directories? I did find 120 but parsed 80 saw no skipped ones listed
it has the full path
Plain Text
>>> from llama_index.core import SimpleDirectoryReader
>>> documents = SimpleDirectoryReader("./docs/docs/examples/data/paul_graham").load_data()
>>> documents[0].metadata
{'file_path': '/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-04-16', 'last_modified_date': '2024-04-16'}
>>> 
I was using .MD loader from hub because even fewer seen by SimpleDirectoryReader I will try again
try recursive=True, in simpledirectoryreader method
Add a reply
Sign up and join the conversation on Discord