Find answers from the community

R
Rach
Offline, last seen 3 months ago
Joined September 25, 2024
Anyone have a positive experience using the Microsoft SharePoint Reader from Llama Hub as a reader/loader? I'm also curious if anyone has been able to stack the the SharePoint reader with other, file-type specific loaders? I need to pull files from Sharepoint, but I want to the individual file types to load in the best way possible for future parsing.
1 comment
W
Hi everyone, I'm trying to load different types of source files using different readers. I just got an error for the HTMLTagReader Failed to load file NAME with error: HTMLTagReader.load_data() missing 1 required positional argument: 'file'. Skipping... and now I'm second guessing my function:
def document_loader(docs_relative_path): # Define custom readers ##Readers found in https://llamahub.ai/?tab=readers class MyHTMLTagReader(HTMLTagReader): pass class MyJSONReader(JSONReader): pass class MyPPTReader(PptxReader): pass class MyXMLReader(XMLReader): pass #Currently just for .pdf ##LlamaCloud account parser = LlamaParse( api_key="", result_type="text", verbose=True, ) # Create custom file extractors dictionary file_extractors = { ".html": MyHTMLTagReader, ".json": MyJSONReader, ".pdf": parser, ".pptx, .ppt": MyPPTReader, ".xml": MyXMLReader, } # Initialize SimpleDirectoryReader with custom file extractors ## SimpleDirectoryReader reads any files it finds, treating them all as text. It explicity supports:.csv, .docx, .epub, .hwp, .ipynb, .jpeg, .jpg, .mbox, .md, .mp3, .mp4, .pdf, .png, .ppt, .pptm, .pptx reader = SimpleDirectoryReader(input_dir=docs_relative_path, file_extractor=file_extractors, filename_as_id=False) # Load documents documents = reader.load_data() print("Number of documents loaded:", len(documents)) # Do further processing with loaded documents return documents

Any tips?
2 comments
R
L