Hi everyone, I'm trying to load different types of source files using different readers. I just got an error for the HTMLTagReader Failed to load file NAME with error: HTMLTagReader.load_data() missing 1 required positional argument: 'file'. Skipping...
and now I'm second guessing my function:
def document_loader(docs_relative_path):
# Define custom readers
##Readers found in https://llamahub.ai/?tab=readers
class MyHTMLTagReader(HTMLTagReader):
pass
class MyJSONReader(JSONReader):
pass
class MyPPTReader(PptxReader):
pass
class MyXMLReader(XMLReader):
pass
#Currently just for .pdf
##LlamaCloud account
parser = LlamaParse(
api_key="",
result_type="text",
verbose=True,
)
# Create custom file extractors dictionary
file_extractors = {
".html": MyHTMLTagReader,
".json": MyJSONReader,
".pdf": parser,
".pptx, .ppt": MyPPTReader,
".xml": MyXMLReader,
}
# Initialize SimpleDirectoryReader with custom file extractors
## SimpleDirectoryReader reads any files it finds, treating them all as text. It explicity supports:.csv, .docx, .epub, .hwp, .ipynb, .jpeg, .jpg, .mbox, .md, .mp3, .mp4, .pdf, .png, .ppt, .pptm, .pptx
reader = SimpleDirectoryReader(input_dir=docs_relative_path, file_extractor=file_extractors, filename_as_id=False)
# Load documents
documents = reader.load_data()
print("Number of documents loaded:", len(documents))
# Do further processing with loaded documents
return documents
Any tips?