Find answers from the community

Updated 3 months ago

Hi I started using llama index a few

Hi! I started using llama_index a few days ago and it's great! However now I'd like to add some HTML files to my vector store and it looks like SimpleDirectoryReader does not have specifc HTML support, nor anything on llama-hub. I'm probably missing something? Am I supposed to leave the html tags in and treat it like a normal text file?
L
C
6 comments
I think you need to use a BeautifulSoup reader for this?
ah nice thanks, didn't see that one on the hub
unstructured should also handle this well for local files (it looks like the other web readers need URLs)
oh, I looked at unstructured but I didn't get I can use it locally, as I went on to the SaaS variant first. Ok nice, one of these will certainly work
unstructured has a local version tho
Add a reply
Sign up and join the conversation on Discord