Find answers from the community

Updated 9 months ago

I like to embed HTML data from files how

I like to embed HTML data from files how would I do that?
W
o
8 comments
You want to read from html files? Could you explain a little on this!
ok, I will export data from our confluence to then run this into a vector store, main issue here is the AI box can not get access to the confluence, so I can not run a simple crawler as I found in the docu. so the idea is to export that data into files and then read those files but keep them handled as html files
for my training prject I did have just textfiles with no html and I did it like this: documents = SimpleDirectoryReader(subdir).load_data()
so now the content will be html and I wonder how to get the files treated as html and not just like text
So basically you want to extract the main text from html files right?
yes, I like to get somehow the same treatment as if I would use something like this: documents = SimpleWebPageReader(html_to_text=True).load_data(
["http://paulgraham.com/worked.html"]
)
Oh cool, thank you very much πŸ™‚
Add a reply
Sign up and join the conversation on Discord