ok, I will export data from our confluence to then run this into a vector store, main issue here is the AI box can not get access to the confluence, so I can not run a simple crawler as I found in the docu. so the idea is to export that data into files and then read those files but keep them handled as html files
yes, I like to get somehow the same treatment as if I would use something like this: documents = SimpleWebPageReader(html_to_text=True).load_data( ["http://paulgraham.com/worked.html"] )
found these two, You can make your own reader, taking help from the other reader, get the text from the file and pass to any of the above two mentioned library.