The community member is using llama_index and wants to add HTML files to their vector store, but the SimpleDirectoryReader does not have specific HTML support. The community members suggest using a BeautifulSoup reader or the unstructured library, which can handle local HTML files. One community member notes that the unstructured library has a local version that can be used for this purpose.
Hi! I started using llama_index a few days ago and it's great! However now I'd like to add some HTML files to my vector store and it looks like SimpleDirectoryReader does not have specifc HTML support, nor anything on llama-hub. I'm probably missing something? Am I supposed to leave the html tags in and treat it like a normal text file?