Find answers from the community

Updated 6 months ago

Hi I started using llama index a few

At a glance

The community member is using llama_index and wants to add HTML files to their vector store, but the SimpleDirectoryReader does not have specific HTML support. The community members suggest using a BeautifulSoup reader or the unstructured library, which can handle local HTML files. One community member notes that the unstructured library has a local version that can be used for this purpose.

Useful resources
Hi! I started using llama_index a few days ago and it's great! However now I'd like to add some HTML files to my vector store and it looks like SimpleDirectoryReader does not have specifc HTML support, nor anything on llama-hub. I'm probably missing something? Am I supposed to leave the html tags in and treat it like a normal text file?
L
C
6 comments
I think you need to use a BeautifulSoup reader for this?
ah nice thanks, didn't see that one on the hub
unstructured should also handle this well for local files (it looks like the other web readers need URLs)
oh, I looked at unstructured but I didn't get I can use it locally, as I went on to the SaaS variant first. Ok nice, one of these will certainly work
unstructured has a local version tho
Add a reply
Sign up and join the conversation on Discord