Find answers from the community

Updated 3 months ago

Worked with web reader

At a glance

The community member is asking if it is a good approach to create an ingestion pipeline using documents from SimpleDirectoryReader and nodes from HTML files parsed with HTMLNodeParser. In the comments, another community member suggests that this approach works with a web reader, providing specific configuration details for the web reader, including driver arguments and URLs to be used.

GGianluca

Hi, do you think is a good approach to create a ingestion pipeline with documents from SimpleDirectoryReader and nodes from HTML files parsed with HTMLNodeParser?

1 comment

GGianluca

It work with web reader 😄

Plain Text

web:
  driver_arguments:
    - --no-sandbox
    - --disable-dev-shm-usage
    - --headless
  urls: 
    - prefix: "file:///app/data/web/confluence-export/Folder"
      base_url: "file:///app/data/web/confluence-export/Folder/index.html"
      max_depth: 10000

Add a reply